Introduction
This analysis examines non-standardised data collected from GP surgeries to address two critical NHS operational questions:
- Has there been adequate staff and capacity in the networks?
- What was the actual utilisation of resources?
The focus centres on analysing appointment data (missed and attended) to identify patterns, trends, and drivers that can produce actionable recommendations for improving services and resource allocation.
Research / Exploratory Questions
- What is the date range and scope of the provided datasets?
- What is the distribution of locations, service settings, context types, national categories, and appointment statuses?
- Are there monthly/seasonal trends in appointments and cancellations?
- Which periods show the highest appointment volumes and cancellation rates?
- What is the typical appointment duration across different services?
- How do external factors correlate with appointment patterns?
Analytical Questions
- What are the primary drivers of missed appointments and can this be predicted?
- Are there capacity shortages and resource utilisation issues?
- What evidence-based recommendations can improve attendance rates?
- Can the NHS adjust staffing and scheduling to optimise both efficiency and outcomes?
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
from datetime import datetime
import warnings
warnings.filterwarnings('ignore', category=FutureWarning)
Categorical Ordering¶
# Define duration order
min_order = ['1-5 ', '6-10 ', '11-15 ', '16-20 ', '21-30 ', '31-60 ']
# Define Appointment Order
app_order = ['6','0','2','4','1','3','2']
# Define categorical order
day_order2 = ['Same Day', '1 Day', '2 to 7 Days', '8 to 14 Days',
'15 to 21 Days', '22 to 28 Days', 'More than 28 Days']
# Define day order
day_order = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
# Define your month order
month_order = ["August", "September", "October", "November", "December",
"January", "February", "March", "April", "May", "June", "July"]
Defined Functions¶
def format_plotly_chart(fig):
"""Simple Plotly formatting function - aesthetics only"""
return fig.update_layout(
xaxis_tickangle=45,
width=800,
height=400,
font_size=12,
showlegend=True,
plot_bgcolor='white',
paper_bgcolor='white',
xaxis=dict(
categoryorder='array',
categoryarray=month_order
)
)
# Define lead time order function that accepts a DataFrame
def lead_time_order(df):
lead_times = [
'Same Day',
'1 Day',
'2 to 7 Days',
'8 to 14 Days',
'15 to 21 Days',
'22 to 28 Days',
'More than 28 Days'
]
# Convert booking_lead_time to categorical with specified order
df['booking_lead_time'] = pd.Categorical(
df['booking_lead_time'],
categories=lead_times,
ordered=True
)
# Sort by the categorical column
return df.sort_values('booking_lead_time')
def pivot_function(
data,
group_by_column,
outcome_column,
count_column,
time_column,
target_outcome=None,
group_values=None,
flatten_columns=False
):
# Basic validation
for col in [group_by_column, outcome_column, count_column, time_column]:
if col not in data.columns:
raise ValueError(f"Column '{col}' not found in data.")
# Filter out null or 'Unknown' values
filtered_data = data.dropna(subset=[outcome_column, group_by_column])
filtered_data = filtered_data[
(filtered_data[outcome_column] != 'Unknown') &
(filtered_data[group_by_column] != 'Unknown')
]
# Optional filtering by group values
if group_values:
filtered_data = filtered_data[filtered_data[group_by_column].isin(group_values)]
# Create pivot table
pivot = pd.pivot_table(
filtered_data,
values=count_column,
index=time_column,
columns=[group_by_column, outcome_column],
aggfunc='sum'
)
if pivot.empty:
raise ValueError("The pivot table is empty after filtering. Check your filters or input data.")
# Auto-detect target outcome if not specified
if target_outcome is None:
target_outcome = sorted(pivot.columns.get_level_values(1).unique())[0]
# Add percentage columns
for group_value in pivot.columns.get_level_values(0).unique():
group_cols = pivot[group_value].columns
if target_outcome in group_cols:
total = pivot[group_value].sum(axis=1)
percentage = (pivot[group_value][target_outcome] / total * 100).round(2)
pivot[(group_value, f'{target_outcome} (%)')] = percentage
# Optional: flatten MultiIndex columns for easier export or plotting
if flatten_columns:
pivot.columns = ['_'.join(map(str, col)).strip() for col in pivot.columns.values]
return pivot
Standardise Plot Settings¶
def plot_format():
# Use Seaborn's color-blind-friendly palette
sns.set_palette("colorblind")
# Apply Seaborn's "darkgrid" style
sns.set_style("darkgrid")
# Set general figure aesthetics
plt.rcParams.update({
"figure.figsize": (7,4.67), # Figure size
"axes.titlesize": 14, # Title font size
"axes.labelsize": 12, # Label font size
"xtick.labelsize": 10, # X-axis tick font size
"ytick.labelsize": 10, # Y-axis tick font size
"legend.fontsize": 10, # Legend font size
"grid.color": "gray", # Grid color
"grid.linestyle": "--", # Dashed grid lines
"grid.alpha": 0.5, # Grid transparency
"axes.grid": True, # Enable grid
"axes.edgecolor": "black", # Axis border color
"axes.spines.top": False, # Hide top spine
"axes.spines.right": False, # Hide right spine
"xtick.direction": "out", # Outward ticks
"ytick.direction": "out",
"font.family": "DejaVu Sans", # Font choice (good for readability)
"font.size": 12,
})
# defined function for identifying 'unknowns'
def unknown_summary(df):
import pandas as pd
result = pd.DataFrame({
'Count': df.eq("Unknown").sum(),
'%': (df.eq("Unknown").sum() / len(df) * 100).round(1)
})
print(result.to_string())
Importation of three datasets¶
# Import and sense-check the actual_duration.xlsx data set as ad.
ad = pd.read_excel('actual_duration_clean.xlsx')
# Import the national_categories.xlsx data set as nc.
nc = pd.read_excel('national_categories_clean.xlsx')
# Import the appointments_regional file
ar = pd.read_excel('appointments_regional_clean.xlsx')
Data Validation¶
print(ad.shape[0])
print(nc.shape[0])
print(ar.shape[0])
137793 817394 596821
Merge will not be viable as datasets are different lengths
# Assess metadata
print(ad.dtypes)
print(nc.dtypes)
print(ar.dtypes)
sub_icb_code object sub_icb_code2 object icb_location_name object icb_ons_code object region_ons_code object appointment_date datetime64[ns] duration object appointment_count int64 dtype: object appointment_date datetime64[ns] icb_ons_code object icb_location_name object sub_icb_code object service_setting object context_type object national_category object appointment_count int64 appointment_month object dtype: object icb_ons_code object appointment_month object appointment_status object hcp_type object appointment_mode object booking_lead_time object appointment_count int64 dtype: object
Note: appointment month is an 'object' in appointments dataset.
# Check the min and max dates.
print(ad['appointment_date'].min())
print(ad['appointment_date'].max())
print(nc['appointment_date'].min())
print(nc['appointment_date'].max())
print(ar['appointment_month'].min())
print(ar['appointment_month'].max())
2021-12-01 00:00:00 2022-06-30 00:00:00 2021-08-01 00:00:00 2022-06-30 00:00:00 2020-01 2022-06
Note: start dates differ in the three datasets. Appointments were scheduled between August 21' and June 22'
Data Cleaning (all datasets)¶
# Check for missing values
print(ad.isnull().sum())
print(nc.isnull().sum())
print(ar.isnull().sum())
sub_icb_code 0 sub_icb_code2 0 icb_location_name 0 icb_ons_code 0 region_ons_code 0 appointment_date 0 duration 0 appointment_count 0 dtype: int64 appointment_date 0 icb_ons_code 0 icb_location_name 0 sub_icb_code 0 service_setting 0 context_type 0 national_category 0 appointment_count 0 appointment_month 0 dtype: int64 icb_ons_code 0 appointment_month 0 appointment_status 0 hcp_type 0 appointment_mode 0 booking_lead_time 0 appointment_count 0 dtype: int64
# Delete all duplicated rows
ar.drop_duplicates(keep='first',inplace=True)
Note: regional appointment dataset contained 21,604 duplicated rows.
# Determine duplicated rows
print(ad.duplicated().sum())
print(nc.duplicated().sum())
print(ar.duplicated().sum())
0 0 0
unknown_summary(nc)
Count % appointment_date 0 0.0 icb_ons_code 0 0.0 icb_location_name 0 0.0 sub_icb_code 0 0.0 service_setting 0 0.0 context_type 0 0.0 national_category 0 0.0 appointment_count 0 0.0 appointment_month 0 0.0
unknown_summary(ad)
Count % sub_icb_code 0 0.0 sub_icb_code2 0 0.0 icb_location_name 0 0.0 icb_ons_code 0 0.0 region_ons_code 0 0.0 appointment_date 0 0.0 duration 20161 14.6 appointment_count 0 0.0
unknown_summary(ar)
Count % icb_ons_code 0 0.0 appointment_month 0 0.0 appointment_status 193374 33.6 hcp_type 122958 21.4 appointment_mode 77918 13.5 booking_lead_time 26696 4.6 appointment_count 0 0.0
# Create new 'month' and 'day' columns
ad['month'] = ad['appointment_date'].dt.month_name()
ad['day']=ad['appointment_date'].dt.day_name()
ad.head()
| sub_icb_code | sub_icb_code2 | icb_location_name | icb_ons_code | region_ons_code | appointment_date | duration | appointment_count | month | day | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 00L | E38000130 | NHS North East and North Cumbria | E54000050 | E40000012 | 2021-12-01 | 31-60 Minutes | 364 | December | Wednesday |
| 1 | 00L | E38000130 | NHS North East and North Cumbria | E54000050 | E40000012 | 2021-12-01 | 21-30 Minutes | 619 | December | Wednesday |
| 2 | 00L | E38000130 | NHS North East and North Cumbria | E54000050 | E40000012 | 2021-12-01 | 6-10 Minutes | 1698 | December | Wednesday |
| 3 | 00L | E38000130 | NHS North East and North Cumbria | E54000050 | E40000012 | 2021-12-01 | Unknown | 1277 | December | Wednesday |
| 4 | 00L | E38000130 | NHS North East and North Cumbria | E54000050 | E40000012 | 2021-12-01 | 16-20 Minutes | 730 | December | Wednesday |
# Determine / confirm number of regional codes
print(ad['region_ons_code'].unique())
print(ad['region_ons_code'].nunique())
['E40000012' 'E40000010' 'E40000011' 'E40000007' 'E40000005' 'E40000006' 'E40000003'] 7
# Replace codes with names and rename column header
region_map = {
'E40000012': 'NE & Yorkshire',
'E40000007': 'East of England',
'E40000003': 'London',
'E40000011': 'Midlands',
'E40000010': 'North West',
'E40000005': 'South East',
'E40000006': 'South West'
}
ad['region_ons_code'] = ad['region_ons_code'].replace(region_map)
ad.rename({'region_ons_code' : 'region'},axis=1, inplace=True)
ad.head()
| sub_icb_code | sub_icb_code2 | icb_location_name | icb_ons_code | region | appointment_date | duration | appointment_count | month | day | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 00L | E38000130 | NHS North East and North Cumbria | E54000050 | NE & Yorkshire | 2021-12-01 | 31-60 Minutes | 364 | December | Wednesday |
| 1 | 00L | E38000130 | NHS North East and North Cumbria | E54000050 | NE & Yorkshire | 2021-12-01 | 21-30 Minutes | 619 | December | Wednesday |
| 2 | 00L | E38000130 | NHS North East and North Cumbria | E54000050 | NE & Yorkshire | 2021-12-01 | 6-10 Minutes | 1698 | December | Wednesday |
| 3 | 00L | E38000130 | NHS North East and North Cumbria | E54000050 | NE & Yorkshire | 2021-12-01 | Unknown | 1277 | December | Wednesday |
| 4 | 00L | E38000130 | NHS North East and North Cumbria | E54000050 | NE & Yorkshire | 2021-12-01 | 16-20 Minutes | 730 | December | Wednesday |
# Create new month and day columns for plotting later
ar['month'] = ar['appointment_month'].apply(lambda x: datetime.strptime(x, '%Y-%m').strftime('%B'))
# Count number of 42 icb codes
ar['icb_ons_code'].nunique()
42
# Convert 42 icb codes to named locations
# East of Eng(07), London(03), Midlands(11), NE and Yorkshire (12), North West(10), South East(05), South West(06)
ar.loc[ar['icb_ons_code'].isin(['E54000050','E54000051','E54000054','E54000061']),'region'] = 'North East & Yorkshire'
ar.loc[ar['icb_ons_code'].isin(['E54000008','E54000048','E54000057']),'region'] = 'North West'
ar.loc[ar['icb_ons_code'].isin(['E54000030','E54000031','E54000029','E54000028','E54000027']),'region'] = 'London'
ar.loc[ar['icb_ons_code'].isin(['E54000053','E54000052','E54000044','E54000042',
'E54000034','E54000032']),'region'] = 'South East'
ar.loc[ar['icb_ons_code'].isin(['E54000036','E54000037','E54000038','E54000039',
'E54000040','E54000041','E54000043']),'region'] = 'South West'
ar.loc[ar['icb_ons_code'].isin(['E54000010','E54000011','E54000013','E54000015',
'E54000018','E54000019','E54000055','E54000058',
'E54000059','E54000060','E54000062']),'region'] = 'Midlands'
ar.loc[ar['icb_ons_code'].isin(['E54000056','E54000022','E54000023','E54000024'
,'E54000025','E54000026']),'region'] = 'East of England'
# Filter the data set to only look at data from 2021-08 onwards
ar = ar[ar['appointment_month'] >= '2021-08']
# Delete / Filter out rows with 'Unknown' values in 'Booking Lead Time' column only
ar = ar[ar['booking_lead_time']!='Unknown']
# Filter out'Unknowns' (for heatmaps)
ad = ad[ad['duration']!='Unknown']
# Check 'Unknown' values have been removed
unknown_counts = ad.eq("Unknown").sum()
unknown_counts
sub_icb_code 0 sub_icb_code2 0 icb_location_name 0 icb_ons_code 0 region 0 appointment_date 0 duration 0 appointment_count 0 month 0 day 0 dtype: int64
# Remove strings from duration column (for heatmaps)
ad['duration'] = ad['duration'].str.replace('Minutes','')
# Determine average for the two numbers
ad['duration_float'] = ad['duration'].apply(lambda x: sum(map(float, x.split('-'))))/2
# Create 'month'column
nc['month'] = nc['appointment_date'].dt.month_name()
# Create 'day column
nc['day']=nc['appointment_date'].dt.day_name()
# View top of DataFrame.
nc.head()
| appointment_date | icb_ons_code | icb_location_name | sub_icb_code | service_setting | context_type | national_category | appointment_count | appointment_month | month | day | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2021-08-02 | E54000050 | NHS North East and North Cumbria ICB | 00L | Primary Care Network | Care Related Encounter | Patient contact during Care Home Round | 3 | 2021-08 | August | Monday |
| 1 | 2021-08-02 | E54000050 | NHS North East and North Cumbria ICB | 00L | Other | Care Related Encounter | Planned Clinics | 7 | 2021-08 | August | Monday |
| 2 | 2021-08-02 | E54000050 | NHS North East and North Cumbria ICB | 00L | General Practice | Care Related Encounter | Home Visit | 79 | 2021-08 | August | Monday |
| 3 | 2021-08-02 | E54000050 | NHS North East and North Cumbria ICB | 00L | General Practice | Care Related Encounter | General Consultation Acute | 725 | 2021-08 | August | Monday |
| 4 | 2021-08-02 | E54000050 | NHS North East and North Cumbria ICB | 00L | General Practice | Care Related Encounter | Structured Medication Review | 2 | 2021-08 | August | Monday |
# Create a 'region' column
nc['region']="1"
# Identify number of unique codes
print(nc['icb_ons_code'].unique())
print(nc['icb_ons_code'].nunique())
['E54000050' 'E54000048' 'E54000057' 'E54000008' 'E54000061' 'E54000060' 'E54000054' 'E54000051' 'E54000015' 'E54000010' 'E54000056' 'E54000025' 'E54000023' 'E54000026' 'E54000053' 'E54000044' 'E54000042' 'E54000041' 'E54000043' 'E54000036' 'E54000038' 'E54000039' 'E54000055' 'E54000058' 'E54000037' 'E54000019' 'E54000022' 'E54000031' 'E54000013' 'E54000030' 'E54000059' 'E54000032' 'E54000052' 'E54000040' 'E54000028' 'E54000029' 'E54000018' 'E54000062' 'E54000034' 'E54000024' 'E54000011' 'E54000027'] 42
# Populate 'region' column based on 42 icb codes and 7 region codes
nc.loc[nc['icb_ons_code'].isin(['E54000050','E54000051','E54000054','E54000061']),'region'] = 'North East & Yorkshire'
nc.loc[nc['icb_ons_code'].isin(['E54000008','E54000048','E54000057']),'region'] = 'North West'
nc.loc[nc['icb_ons_code'].isin(['E54000030','E54000031','E54000029','E54000028',
'E54000027']),'region'] = 'London'
nc.loc[nc['icb_ons_code'].isin(['E54000053','E54000052','E54000044','E54000042',
'E54000034','E54000032']),'region'] = 'South East'
nc.loc[nc['icb_ons_code'].isin(['E54000036','E54000037','E54000038','E54000039',
'E54000040','E54000041','E54000043']),'region'] = 'South West'
nc.loc[nc['icb_ons_code'].isin(['E54000010','E54000011','E54000013','E54000015',
'E54000018','E54000019','E54000055','E54000058',
'E54000059','E54000060','E54000062']),'region'] = 'Midlands'
nc.loc[nc['icb_ons_code'].isin(['E54000056','E54000022','E54000023','E54000024'
,'E54000025','E54000026']),'region'] = 'East of England'
nc.head()
| appointment_date | icb_ons_code | icb_location_name | sub_icb_code | service_setting | context_type | national_category | appointment_count | appointment_month | month | day | region | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2021-08-02 | E54000050 | NHS North East and North Cumbria ICB | 00L | Primary Care Network | Care Related Encounter | Patient contact during Care Home Round | 3 | 2021-08 | August | Monday | North East & Yorkshire |
| 1 | 2021-08-02 | E54000050 | NHS North East and North Cumbria ICB | 00L | Other | Care Related Encounter | Planned Clinics | 7 | 2021-08 | August | Monday | North East & Yorkshire |
| 2 | 2021-08-02 | E54000050 | NHS North East and North Cumbria ICB | 00L | General Practice | Care Related Encounter | Home Visit | 79 | 2021-08 | August | Monday | North East & Yorkshire |
| 3 | 2021-08-02 | E54000050 | NHS North East and North Cumbria ICB | 00L | General Practice | Care Related Encounter | General Consultation Acute | 725 | 2021-08 | August | Monday | North East & Yorkshire |
| 4 | 2021-08-02 | E54000050 | NHS North East and North Cumbria ICB | 00L | General Practice | Care Related Encounter | Structured Medication Review | 2 | 2021-08 | August | Monday | North East & Yorkshire |
Initial Data Exploration¶
# Create a new subset
ad_regional = ad[['icb_location_name','region','appointment_count','month','day','duration_float']].copy()
# Create plot to see regions by month and day of week
sns.relplot(data=ad_regional, x='month',y='appointment_count',hue='region',kind='line',col='day')
plt.show()
# Create plot to see regions by day
sns.relplot(data=ad_regional, x='month',y='appointment_count',hue='icb_location_name',kind='line',col='region',
row='day')
plt.show()
# Facet by year and service setting
nc['year'] = pd.to_datetime(nc['appointment_date']).dt.year
service_order = nc['service_setting'].unique()
g = sns.FacetGrid(nc, col='year', col_wrap=3)
g.map(sns.boxplot, 'service_setting',
'appointment_count',
order=service_order)
g.set_xticklabels(rotation=90)
plt.show()
# Facet by region and service setting
service_order = nc['service_setting'].unique()
g = sns.FacetGrid(nc, col='region', col_wrap=3)
g.map(sns.boxplot, 'service_setting',
'appointment_count',
order=service_order)
g.set_xticklabels(rotation=90)
plt.show()
Daily appointment count¶
# Appointments by day (nc dataset)
day_stats = nc.groupby('day')['appointment_count'].sum().sort_values(ascending=False)
result = pd.DataFrame({
'count': day_stats,
'percentage': (day_stats / day_stats.sum()).round(3) * 100
})
print(result)
count percentage day Tuesday 61806933 20.9 Monday 59695267 20.2 Wednesday 58984265 19.9 Thursday 56976354 19.2 Friday 52394868 17.7 Saturday 5574922 1.9 Sunday 614161 0.2
# Daily appointments (Ad dataset)
plot_format()
# Create empty plot
fig, ax = plt.subplots()
plt.title('Daily Appointment Count')
# Use weights parameter to sum appointment_count for each day
ax.hist(ad['day'], bins=7, weights=ad['appointment_count'])
# Add labels
ax.set_ylabel('Total Appointments')
ax.set_xlabel('Day of Week')
plt.xticks(rotation=45)
plt.tight_layout()
Monthly appointments and records¶
# Number of appointments per month (sum of count_of_appointments by month).
records = (nc.groupby('month')['appointment_count'].agg([
('Appointments' , 'sum'),
('Records' , 'count')
]).assign(
pct = lambda x: (x['Appointments'] / x['Appointments'].sum())*100
).round(1)
.sort_values('Appointments',ascending=False))
print(records)
Appointments Records pct month November 30405070 77652 10.3 October 30303834 74078 10.2 March 29595038 82822 10.0 September 28522501 74922 9.6 May 27495508 77425 9.3 June 25828078 74168 8.7 January 25635474 71896 8.7 February 25355260 71769 8.6 December 25140776 72651 8.5 April 23913060 70012 8.1 August 23852171 69999 8.1
Most appointments are in March, October and November
Service Settings (appointments and records)¶
service_count = (nc.groupby('service_setting')['appointment_count'].agg([
('Records', 'count'),
('Appointments', 'sum')])
.assign(percentage=lambda x: (x['Appointments'] / x['Appointments'].sum()) * 100)).round(1)
print(service_count)
Records Appointments percentage service_setting Extended Access Provision 108122 2176807 0.7 General Practice 359274 270811691 91.5 Other 138789 5420076 1.8 Primary Care Network 183790 6557386 2.2 Unmapped 27419 11080810 3.7
Majority of appointments are within a General Practice
National Categories (appointments)¶
# Count records and appointments
category_count = (nc.groupby('national_category')['appointment_count'].agg([
('Appointments' , 'sum')
]).assign(pct=lambda x: x['Appointments'] / x['Appointments'].sum()) * 100
).round(1)
print(category_count)
Appointments pct national_category Care Home Needs Assessment & Personalised Care ... 40590400 0.1 Care Home Visit 62827900 0.2 Clinical Triage 4154696400 14.0 General Consultation Acute 5369115000 18.1 General Consultation Routine 9727152200 32.9 Group Consultation and Group Education 6063200 0.0 Home Visit 214445200 0.7 Inconsistent Mapping 2789080200 9.4 Non-contractual chargeable work 13891100 0.0 Patient contact during Care Home Round 81033000 0.3 Planned Clinical Procedure 2570269400 8.7 Planned Clinics 2801974800 9.5 Service provided by organisation external to th... 85213300 0.3 Social Prescribing Service 47582800 0.2 Structured Medication Review 185837900 0.6 Unmapped 1108081000 3.7 Unplanned Clinical Activity 305579400 1.0 Walk-in 41243800 0.1
Most appointments are general consultations but 14% are clinical triages which tend to occur when services are under pressure
Context Types (appointments and records)¶
# Count records and appointments
context_count = nc.groupby('context_type')['appointment_count'].agg([
('Appointments' , 'sum'),
('Records' , 'count')
])
print(context_count)
Appointments Records context_type Care Related Encounter 257075158 700481 Inconsistent Mapping 27890802 89494 Unmapped 11080810 27419
Day & appointment length¶
# Create pivot table (for heatmap)
duration_pivot = (
ad.pivot_table(
values='appointment_count',
index='duration',
columns='day',
aggfunc='sum'
)
.fillna(0)
.reindex(columns=day_order) # Reorder columns by day
.reindex(min_order) # Reorder rows by duration
)
duration_pivot
| day | Monday | Tuesday | Wednesday | Thursday | Friday | Saturday | Sunday |
|---|---|---|---|---|---|---|---|
| duration | |||||||
| 1-5 | 5748884 | 5909039 | 5977117 | 5621236 | 5112844 | 191952 | 39793 |
| 6-10 | 6813610 | 7015810 | 7062545 | 6687714 | 6027435 | 159258 | 34443 |
| 11-15 | 5062230 | 5259482 | 5225380 | 5006471 | 4466494 | 118320 | 22505 |
| 16-20 | 3200248 | 3359463 | 3319817 | 3202112 | 2836363 | 73074 | 13170 |
| 21-30 | 2968529 | 3168346 | 3112977 | 3029930 | 2670309 | 63853 | 12421 |
| 31-60 | 1760412 | 1917537 | 1886506 | 1851003 | 1642298 | 37312 | 8364 |
# Simple Plotly heatmap
fig = go.Figure(data=go.Heatmap(
z=duration_pivot / 1000000,
x=duration_pivot.columns, # Extract column labels (days)
y=duration_pivot.index, # Extract row labels (duration ranges)
colorscale='RdBu_r',
showscale=True,
text=(duration_pivot / 1000000).round(1),
texttemplate="%{z:.1f}",
textfont={"size": 12}
))
fig.update_layout(
title='Appointment Count in Millions',
xaxis_title="Day of Week",
yaxis_title="Duration (minutes)"
)
fig.show()
# Create 2nd pivot table (for heatmaps)
region_pivot = pd.pivot_table(
data=ad,
values='appointment_count',
index='duration',
columns='region',
aggfunc='sum'
)
region_pivot = region_pivot.reindex(min_order) # re-order y axis
region_pivot
| region | East of England | London | Midlands | NE & Yorkshire | North West | South East | South West |
|---|---|---|---|---|---|---|---|
| duration | |||||||
| 1-5 | 3651451 | 3782616 | 5726686 | 4855371 | 2958938 | 4411714 | 3214089 |
| 6-10 | 4214200 | 4387847 | 6856802 | 5772736 | 3568966 | 5130150 | 3870114 |
| 11-15 | 3084578 | 3408083 | 5023018 | 4278092 | 2779066 | 3737221 | 2850824 |
| 16-20 | 1917654 | 2173747 | 3137413 | 2705941 | 1806421 | 2403073 | 1859998 |
| 21-30 | 1757527 | 2045572 | 2893048 | 2501396 | 1727293 | 2297229 | 1804300 |
| 31-60 | 1040916 | 1358224 | 1722190 | 1441085 | 1039191 | 1408854 | 1092972 |
Monthly analysis not possible due to missing data from August to November
Exploration: Region and appointment length)¶
# Simple Plotly heatmap
fig = go.Figure(data=go.Heatmap(
z=region_pivot / 1000000,
x=region_pivot.columns, # Extract column labels (days)
y=region_pivot.index, # Extract row labels (duration ranges)
colorscale='RdBu_r',
showscale=True,
text=(duration_pivot / 1000000).round(1),
texttemplate="%{z:.1f}",
textfont={"size": 12}
))
fig.update_layout(
title='Appointment Count in Millions',
xaxis_title="Day of Week",
yaxis_title="Duration (minutes)"
)
fig.show()
Most appointments are 6-10 minutes long, occur on Tuesday and Wednesday in the Midlands
Appointment lengths¶
# Estimate mean meeting duration
average_duration = (ad['duration_float'].mean().round(2))
print(f"The estimated average appointment is an estimated {average_duration} minutes.")
The estimated average appointment is an estimated 18.64 minutes.
# Create boxplot to show distribution of appointment lengths
plot_format()
sns.violinplot(data=ad['duration_float'])
plt.title('Distribution of Appointment Lengths')
plt.ylabel('Duration (minutes)')
plt.show()
#so the longer appointments may relate to something specific.
There is a skew towards shorter appointments, but also no appointments between 30 and 40 minutes?
# Determine skewness of appointment count -1/+1 optimal, -2/+2 acceptable Curran et al. (1996)
skew = ad['duration_float'].skew().round(1)
print(f"The skewness is {skew} indicating a positive skew which reflects violin plot")
The skewness is 0.9 indicating a positive skew which reflects violin plot
Exploration: Categories and Values¶
# Determine the number of locations
print(f"There are {nc['region'].nunique()} NHS regions")
print(f"There are {nc['icb_ons_code'].nunique()} NHS sub-locations")
There are 7 NHS regions There are 42 NHS sub-locations
most_apps = ad.groupby('icb_location_name').agg({
'appointment_count': 'sum'
}).sort_values('appointment_count', ascending=False)
print(f"The sub-locations with most appointments are:\n{most_apps.head(5)}")
The sub-locations with most appointments are:
appointment_count
icb_location_name
NHS North East and North Cumbria 7450463
NHS West Yorkshire 6418497
NHS Greater Manchester 5494459
NHS North West London 5230192
NHS Cheshire and Merseyside 4978737
most_apps = nc.groupby('icb_location_name').agg({
'appointment_count': 'sum'
}).sort_values('appointment_count', ascending=False)
print(f"The sub-locations with most appointments are:\n{most_apps.head(5)}")
The sub-locations with most appointments are:
appointment_count
icb_location_name
NHS North East and North Cumbria ICB 16882235
NHS West Yorkshire ICB 14358371
NHS Greater Manchester ICB 13857900
NHS Cheshire and Merseyside ICB 13250311
NHS North West London ICB 12142390
print(f"There are {nc['service_setting'].nunique()} service settings. These are:")
print(nc['service_setting'].unique())
There are 5 service settings. These are: ['Primary Care Network' 'Other' 'General Practice' 'Unmapped' 'Extended Access Provision']
print(f"There are {(nc['context_type'].nunique())} context types. These are:")
print(nc['context_type'].unique())
There are 3 context types. These are: ['Care Related Encounter' 'Unmapped' 'Inconsistent Mapping']
# Determine the number of national categories.
print(f"There are {(nc['national_category'].nunique())} national categories. These are:")
print(nc['national_category'].unique())
There are 18 national categories. These are: ['Patient contact during Care Home Round' 'Planned Clinics' 'Home Visit' 'General Consultation Acute' 'Structured Medication Review' 'Care Home Visit' 'Unmapped' 'Clinical Triage' 'Planned Clinical Procedure' 'Inconsistent Mapping' 'Care Home Needs Assessment & Personalised Care and Support Planning' 'General Consultation Routine' 'Service provided by organisation external to the practice' 'Unplanned Clinical Activity' 'Social Prescribing Service' 'Non-contractual chargeable work' 'Group Consultation and Group Education' 'Walk-in']
# Determine unique values within each column
unique = nc.select_dtypes(include=['object', 'category']).nunique().to_dict()
unique
{'icb_ons_code': 42,
'icb_location_name': 42,
'sub_icb_code': 106,
'service_setting': 5,
'context_type': 3,
'national_category': 18,
'appointment_month': 11,
'month': 11,
'day': 7,
'region': 7}
print(ar['month'].unique())
['August' 'September' 'October' 'November' 'December' 'January' 'February' 'March' 'April' 'May' 'June']
Note: July is missing from the data
unique = ar.select_dtypes(include=['object', 'category']).nunique().to_dict()
unique
{'icb_ons_code': 42,
'appointment_month': 11,
'appointment_status': 3,
'hcp_type': 3,
'appointment_mode': 5,
'booking_lead_time': 7,
'month': 11,
'region': 7}
# Determine the number of appointment statuses.
print(ar['appointment_status'].unique())
print(ar['hcp_type'].unique())
print(ar['appointment_mode'].unique())
['Attended' 'DNA' 'Unknown'] ['GP' 'Other Practice staff' 'Unknown'] ['Home Visit' 'Video/Online' 'Face-to-Face' 'Unknown' 'Telephone']
Most appointments occur between Sept - Nov and in March
Regional analysis¶
# create a new subset
nc_regional = nc[['icb_location_name','region','appointment_count','month','day']].copy()
# Create plot
plot_format()
sns.lineplot(data=nc_regional,
x='month',
y='appointment_count',
hue='region',
errorbar=None)
plt.title('Regional appointment count')
plt.xticks(rotation=45)
plt.legend(loc='upper right')
plt.legend(
bbox_to_anchor=(1.05, 1),
loc='upper left');
plt.show()
#plt.savefig('slide_6.png',dpi=150, bbox_inches='tight')
# create a new subset for regional analysis of ad data.
ar_regional = ar[['appointment_status', 'hcp_type', 'appointment_mode',
'booking_lead_time', 'appointment_count', 'month']].copy()
# view the subset
ar_regional.head()
| appointment_status | hcp_type | appointment_mode | booking_lead_time | appointment_count | month | |
|---|---|---|---|---|---|---|
| 350 | Attended | GP | Home Visit | More than 28 Days | 1 | August |
| 351 | Attended | GP | Video/Online | Same Day | 1 | August |
| 352 | Attended | Other Practice staff | Video/Online | More than 28 Days | 1 | August |
| 353 | Attended | Unknown | Face-to-Face | 8 to 14 Days | 1 | August |
| 355 | DNA | GP | Home Visit | 1 Day | 1 | August |
sns.lineplot(data=ar, x='month',y='appointment_count',hue='region',errorbar=None)
plt.legend(loc='center left', bbox_to_anchor=(1, 0.8))
plt.title('Regional Appointments')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()
Location and service setting¶
def analyze_location(location_name):
return (nc
.query(f"icb_location_name == '{location_name}'")
.pipe(lambda df: df.groupby('service_setting')['appointment_count']
.agg([
('Count', 'count'),
('%', lambda x: (x.count() / len(df) * 100).round(2))
])))
# This creates a DataFrame with MultiIndex
all_locations = pd.concat({
loc: analyze_location(loc)
for loc in nc['icb_location_name'].unique()
})
heatmap_data = all_locations['%'].unstack().fillna(0)
fig = go.Figure(go.Heatmap(
z=heatmap_data.values,
x=heatmap_data.columns,
y=heatmap_data.index,
colorscale='YlGnBu',
text=heatmap_data.round(1).astype(str),
texttemplate="%{text}", # Show labels on heatmap
hovertemplate="Service: %{x}<br>Location: %{y}<br>%{z:.1f}%<extra></extra>",
colorbar=dict(title="Percentage")
))
fig.update_layout(
title="Service Setting by Location",
height=1000,
xaxis_tickangle=-45
)
fig.show()
Exploration: Location and context type¶
def analyze_location(location_name):
return (nc
.query(f"icb_location_name == '{location_name}'")
.pipe(lambda df: df.groupby('context_type')['appointment_count']
.agg([
('Count', 'count'),
('%', lambda x: (x.count() / len(df) * 100).round(2))
])))
# This creates a DataFrame with MultiIndex
all_locations = pd.concat({
loc: analyze_location(loc)
for loc in nc['icb_location_name'].unique()
})
heatmap_data = all_locations['%'].unstack().fillna(0)
fig = go.Figure(go.Heatmap(
z=heatmap_data.values,
x=heatmap_data.columns,
y=heatmap_data.index,
colorscale='YlGnBu',
text=heatmap_data.round(1).astype(str),
texttemplate="%{text}", # Show labels on heatmap
hovertemplate="Service: %{x}<br>Location: %{y}<br>%{z:.1f}%<extra></extra>",
colorbar=dict(title="Percentage")
))
fig.update_layout(
title="Context types by Location",
height=1000,
xaxis_tickangle=-45
)
fig.show()
Location and National Categories¶
def analyze_location(location_name):
return (nc
.query(f"icb_location_name == '{location_name}'")
.pipe(lambda df: df.groupby('national_category')['appointment_count']
.agg([
('Count', 'count'),
('%', lambda x: (x.count() / len(df) * 100).round(2))
])))
# This creates a DataFrame with MultiIndex
all_locations = pd.concat({
loc: analyze_location(loc)
for loc in nc['icb_location_name'].unique()
})
heatmap_data = all_locations['%'].unstack().fillna(0)
fig = go.Figure(go.Heatmap(
z=heatmap_data.values,
x=heatmap_data.columns,
y=heatmap_data.index,
colorscale='YlGnBu',
text=heatmap_data.round(1).astype(str),
texttemplate="%{text}", # Show labels on heatmap
hovertemplate="Service: %{x}<br>Location: %{y}<br>%{z:.1f}%<extra></extra>",
colorbar=dict(title="Percentage")
))
fig.update_layout(
title="National Categories by Location",
height=1000,
xaxis_tickangle=-45
)
fig.show()
There is some variation across regions but nothing that stands out as anomalous
Appointment visualisations¶
Service settings / appointment count
# filter necessary columns
service = nc[['month','service_setting','appointment_count']]
# Create lineplot
plot_format()
sns.lineplot(data=service,
x='month',
y='appointment_count',
hue='service_setting',
errorbar=None)
plt.xticks(rotation=45)
plt.title('Number of appointments for service settings')
plt.show()
Context types / appointment count
# Create context type dataframe
context = nc[['context_type','month','appointment_count']].copy()
# Create a lineplot.
plot_format()
sns.lineplot(data=context,
x='month',
hue='context_type',
y='appointment_count',
errorbar=None)
plt.xticks(rotation=45)
plt.title('Number of appointments for context types');
# Note: I don't think data is aggregated in chart below
National categories / appointment count
# Create dataframe
national = nc[['month','national_category','service_setting','context_type','appointment_count']].copy()
# Create lineplot.
plt.figure(figsize=(10,8))
plot_format()
sns.lineplot(data=national,
x='month',
hue='national_category',
y='appointment_count',errorbar=None)
plt.xticks(rotation=45)
plt.legend(bbox_to_anchor=(0.5, -0.15), loc='upper center',ncol=2)
plt.title('Number of appointments by national category')
plt.tight_layout()
plt.show()
#plt.savefig("slide_5.png", dpi =150, bbox_inches='tight')
Note: Routine and Acute general consultations are the main driver.
National Categories continued: Filtered out least used categories (see conclusions)¶
# Keep only specific categories
categories_to_keep = ['Planned Clinics', 'General Consultation Routine', 'Clinical Triage',
'Planned Clinical Procedure']
national_filtered = national[national['national_category'].isin(categories_to_keep)].copy()
plot_format()
sns.lineplot(data=national_filtered,
x='month',
y='appointment_count',
hue='national_category')
# Move legend below the chart
plt.legend(
bbox_to_anchor=(0, -0.2), # x, y coordinates
loc='upper left',ncol=3);
plt.title('Autumn and March Planned Procedures and Clinics')
plt.xticks(rotation=45)
plt.savefig('slide_5a.png', dpi=150, bbox_inches='tight')
plt.show()
Note: Planned procedures and planned clinics peak in Autumn, overall peaks in autumn and March.
Filter for General Practices only¶
# Apply two filters at once
gp_filtered = national[
(national['service_setting'] == 'General Practice') &
(national['national_category'].isin(categories_to_keep))
].copy()
plot_format()
sns.lineplot(data=gp_filtered,
x='month',
y='appointment_count',
hue='national_category'
)
plt.legend(
bbox_to_anchor=(0, -0.2), # x, y coordinates
loc='upper left', ncol=3)
plt.title('Autumn and March Planned Procedures and Clinics (GPs)')
plt.xticks(rotation=45)
plt.savefig('slide_5b.png', dpi=150, bbox_inches='tight')
plt.show()
Note: Similar pattern in General Practice compared to overall pattern
Filter for PCNs only¶
# Apply two filters at once
pcn_filtered = national[
(national['service_setting'] == 'Primary Care Network') &
(national['national_category'].isin(categories_to_keep))
].copy()
plot_format()
sns.lineplot(data=pcn_filtered,
x='month',
y='appointment_count',
hue='national_category')
plt.legend(
bbox_to_anchor=(0, -0.2), # x, y coordinates
loc='upper left',ncol=3);
plt.title('Autumn and March Planned Procedures and Clinics (PCNs)')
plt.xticks(rotation=45)
plt.savefig('slide_5c.png', dpi=150, bbox_inches='tight')
plt.show()
Note: different pattern in PCNs. Routine consultations peak in March / May
# Pivot data for Extended Access Provision only
natcat_pivot4 = pd.pivot_table(
data=nc[nc['service_setting']=='Extended Access Provision'],
values='appointment_count',
index='month',
columns='national_category',
aggfunc='sum'
)
# Reindex the pivot table with the specified month order
ordered_pivot = natcat_pivot4.reindex(month_order)
filtered_natcat4 = ordered_pivot.drop(['Group Consultation and Group Education',
'Walk-in','Care Home Needs Assessment & Personalised Care and Support Planning',
'Care Home Visit','Non-contractual chargeable work',
'Patient contact during Care Home Round',
'Service provided by organisation external to the practice',
'Social Prescribing Service','Structured Medication Review','Home Visit',
'Inconsistent Mapping','Unplanned Clinical Activity'], axis=1)
plot_format()
sns.lineplot(data=filtered_natcat4)
# Move legend below the chart
plt.legend(
bbox_to_anchor=(0, -0.2), # x, y coordinates
loc='upper left',ncol=3);
plt.title('Extended Access Provision Appointments')
plt.xticks(rotation=45)
plt.savefig('slide_5d.png', dpi=150, bbox_inches='tight')
plt.show()
Note: Significant peak for planned clinics in Autumn and a smaller peak for planned procedures. In February, the most significant peak is for Acute (sudden severe) general consultation. May warmer weather lead to more allergic reactions>
Focus on Clinical Triage (as triage is a reflection of high demand)¶
filtered = nc[nc['national_category'] == 'Clinical Triage']
triage = filtered.groupby('day')['appointment_count'].sum().reset_index()
fig = go.Figure()
fig.add_trace(go.Bar(
x=triage['day'],
y=triage['appointment_count'],
marker_color='blue', # Direct color assignment
name='Clinical Triage'
))
fig.update_layout(
title='Clinical Triage Appointments by Day',
xaxis_title='Day',
yaxis_title='Appointment Count'
)
fig.show()
Above pattern reflects overall appointments
triage = (nc.groupby('region')
.agg(
total_appointments=('appointment_count', 'sum'),
triage_appointments=('appointment_count', lambda x: x[nc.loc[x.index,
'national_category'] == 'Clinical Triage'].sum())
)
.assign(pct=lambda x: (x['triage_appointments'] / x['total_appointments'] * 100).round(1))
.reset_index())
triage
| region | total_appointments | triage_appointments | pct | |
|---|---|---|---|---|
| 0 | East of England | 34055047 | 4983513 | 14.6 |
| 1 | London | 43484439 | 6827053 | 15.7 |
| 2 | Midlands | 57352329 | 7896633 | 13.8 |
| 3 | North East & Yorkshire | 47915966 | 5815851 | 12.1 |
| 4 | North West | 35865459 | 5261201 | 14.7 |
| 5 | South East | 45003534 | 6529038 | 14.5 |
| 6 | South West | 32369996 | 4233675 | 13.1 |
fig = go.Figure()
fig.add_trace(go.Bar(
x=triage['region'],
y=triage['total_appointments'],
marker_color='blue', # Direct color assignment
name='Other'
))
fig.add_trace(go.Bar(
x=triage['region'],
y=triage['triage_appointments'],
marker_color='red', # Direct color assignment
name='Triage '
))
fig.update_layout(
title='Clinical Triage Appointments by Region',
xaxis_title='Region',
yaxis_title='Appointment Count'
)
fig.show()
# Create pivot
triage = (nc
.pivot_table(
values='appointment_count',
index=['month','service_setting'],
columns='national_category',
aggfunc='sum',
fill_value=0
)
.assign(
total_appointments=lambda x: x.sum(axis=1),
pct=lambda x: (x.get('Clinical Triage', 0) / x.sum(axis=1) * 100).round(1)
)
.rename(columns={'Clinical Triage': 'triage_appointments'})
[['total_appointments', 'triage_appointments', 'pct']]
.reindex(month_order, level='month')
.reset_index()
)
fig = px.line(triage,
x='month',
y='pct',
color='service_setting',
title='Clinical Triage Appointments by Service Setting',
markers=True)
fig.update_layout(
xaxis_title='Month',
yaxis_title='Percentage (%)',
height=500
)
nc.head(1)
| appointment_date | icb_ons_code | icb_location_name | sub_icb_code | service_setting | context_type | national_category | appointment_count | appointment_month | month | day | region | year | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2021-08-02 | E54000050 | NHS North East and North Cumbria ICB | 00L | Primary Care Network | Care Related Encounter | Patient contact during Care Home Round | 3 | 2021-08 | August | Monday | North East & Yorkshire | 2021 |
triage2 = (nc.groupby('service_setting')
.agg(
total_appointments=('appointment_count', 'sum'),
triage_appointments=('appointment_count', lambda x: x[nc.loc[x.index,
'national_category'] == 'Clinical Triage'].sum())
)
.assign(pct=lambda x: (x['triage_appointments'] / x['total_appointments'] * 100).round(1))
.reset_index()
)
triage2
| service_setting | total_appointments | triage_appointments | pct | |
|---|---|---|---|---|
| 0 | Extended Access Provision | 2176807 | 382876 | 17.6 |
| 1 | General Practice | 270811691 | 40246576 | 14.9 |
| 2 | Other | 5420076 | 575083 | 10.6 |
| 3 | Primary Care Network | 6557386 | 342429 | 5.2 |
| 4 | Unmapped | 11080810 | 0 | 0.0 |
#chart moved to conclusions
Primary Care Networks are under less pressure, suggesting they work better than individual practices. Or it's lower because PCNs concentrate on preventative care
GPs = reactive care focus / doctor centric PCN - preventative care focus / range of roles / clinical pharmacists / mental health / chronic disease proactively managed
Create above over months¶
Compare authorities with highest and lowest PCNs: Cambridge (056) & South Essex (026)¶
filtered = ar[
ar['icb_ons_code'].str.contains('000056', case=False, na=False)|
ar['icb_ons_code'].str.contains('000026', case=False, na=False)
]
filtered.tail(3)
| icb_ons_code | appointment_month | appointment_status | hcp_type | appointment_mode | booking_lead_time | appointment_count | month | region | |
|---|---|---|---|---|---|---|---|---|---|
| 596199 | E54000056 | 2022-03 | Attended | GP | Telephone | Same Day | 78283 | March | East of England |
| 596208 | E54000056 | 2021-09 | Attended | GP | Telephone | Same Day | 78801 | September | East of England |
| 596241 | E54000056 | 2021-11 | Attended | GP | Telephone | Same Day | 80719 | November | East of England |
# Cambridge vs South Essex - breakdown by status
cam_ess = (pd.crosstab(filtered['icb_ons_code'],
filtered['appointment_status'],
values=filtered['appointment_count'],
aggfunc='sum')
.reset_index()
.assign(
Total=lambda x: x['Attended'] + x['DNA'],
DNA_rate=lambda x: x['DNA'] / (x['Attended'] + x['DNA']),
Attended_rate=lambda x: x['Attended'] / (x['Attended'] + x['DNA'])
))
cam_ess
| appointment_status | icb_ons_code | Attended | DNA | Unknown | Total | DNA_rate | Attended_rate |
|---|---|---|---|---|---|---|---|
| 0 | E54000026 | 5225460 | 166364 | 230151 | 5391824 | 0.030855 | 0.969145 |
| 1 | E54000056 | 4756334 | 174186 | 183691 | 4930520 | 0.035328 | 0.964672 |
# Cambridge vs South Essex - breakdown by status
cam_ess = (pd.crosstab([filtered['icb_ons_code'], filtered['appointment_mode']],
filtered['appointment_status'],
values=filtered['appointment_count'],
aggfunc='sum')
.reset_index()
.assign(
Total=lambda x: x['Attended'] + x['DNA'],
DNA_rate=lambda x: x['DNA'] / (x['Attended'] + x['DNA']),
attended_rate=lambda x: x['Attended'] / (x['Attended'] + x['DNA'])
))
cam_ess
| appointment_status | icb_ons_code | appointment_mode | Attended | DNA | Unknown | Total | DNA_rate | attended_rate |
|---|---|---|---|---|---|---|---|---|
| 0 | E54000026 | Face-to-Face | 3518398 | 144996 | 173329 | 3663394 | 0.039580 | 0.960420 |
| 1 | E54000026 | Home Visit | 9975 | 342 | 502 | 10317 | 0.033149 | 0.966851 |
| 2 | E54000026 | Telephone | 1440289 | 11070 | 38266 | 1451359 | 0.007627 | 0.992373 |
| 3 | E54000026 | Unknown | 191566 | 6232 | 15220 | 197798 | 0.031507 | 0.968493 |
| 4 | E54000026 | Video/Online | 65232 | 3724 | 2834 | 68956 | 0.054005 | 0.945995 |
| 5 | E54000056 | Face-to-Face | 2795587 | 154868 | 124568 | 2950455 | 0.052490 | 0.947510 |
| 6 | E54000056 | Home Visit | 12370 | 155 | 956 | 12525 | 0.012375 | 0.987625 |
| 7 | E54000056 | Telephone | 1642559 | 9908 | 44132 | 1652467 | 0.005996 | 0.994004 |
| 8 | E54000056 | Unknown | 262449 | 7007 | 11215 | 269456 | 0.026004 | 0.973996 |
| 9 | E54000056 | Video/Online | 43369 | 2248 | 2820 | 45617 | 0.049280 | 0.950720 |
cam_ess = (pd.crosstab(filtered['icb_ons_code'],
filtered['hcp_type'],
values=filtered['appointment_count'],
aggfunc='sum')
.reset_index()
.assign(
Total=lambda x: x['GP'] + x['Other Practice staff'],
GP_pct=lambda x: x['GP'] / (x['GP'] + x['Other Practice staff'])
))
cam_ess
| hcp_type | icb_ons_code | GP | Other Practice staff | Unknown | Total | GP_pct |
|---|---|---|---|---|---|---|
| 0 | E54000026 | 2694766 | 2729563 | 197646 | 5424329 | 0.496793 |
| 1 | E54000056 | 2232712 | 2612805 | 268694 | 4845517 | 0.460779 |
As expected higher proportion of GP appointments in South Essex (GP) compared to PCN. This perhaps explains slightly higher attendance above.
However, whilst Face to Face visits are lower in Cambridge, all other modes are higher
# Cambridge vs South Essex - breakdown by status
cam_ess = (pd.crosstab([filtered['icb_ons_code'], filtered['appointment_mode']],
filtered['hcp_type'],
values=filtered['appointment_count'],
aggfunc='sum')
.reset_index()
.assign(
Total=lambda x: x['Other Practice staff'] + x['GP'],
GP_pct=lambda x: x['GP'] / (x['Other Practice staff'] + x['GP'])
.round(2)
))
cam_ess
| hcp_type | icb_ons_code | appointment_mode | GP | Other Practice staff | Unknown | Total | GP_pct |
|---|---|---|---|---|---|---|---|
| 0 | E54000026 | Face-to-Face | 1693744.0 | 2142979.0 | NaN | 3836723.0 | 0.441456 |
| 1 | E54000026 | Home Visit | 3998.0 | 6821.0 | NaN | 10819.0 | 0.369535 |
| 2 | E54000026 | Telephone | 962926.0 | 526699.0 | NaN | 1489625.0 | 0.646422 |
| 3 | E54000026 | Unknown | 7718.0 | 7654.0 | 197646.0 | 15372.0 | 0.502082 |
| 4 | E54000026 | Video/Online | 26380.0 | 45410.0 | NaN | 71790.0 | 0.367461 |
| 5 | E54000056 | Face-to-Face | 1078383.0 | 1984489.0 | 12151.0 | 3062872.0 | 0.352082 |
| 6 | E54000056 | Home Visit | 4812.0 | 8140.0 | 529.0 | 12952.0 | 0.371526 |
| 7 | E54000056 | Telephone | 1110505.0 | 571659.0 | 14435.0 | 1682164.0 | 0.660165 |
| 8 | E54000056 | Unknown | 31139.0 | 7953.0 | 241579.0 | 39092.0 | 0.796557 |
| 9 | E54000056 | Video/Online | 7873.0 | 40564.0 | NaN | 48437.0 | 0.162541 |
fig = px.bar(cam_ess,
x='appointment_mode',
y='GP_pct',
color='icb_ons_code', # This works in px.bar
title='GP Percentage by Appointment Mode and ICB Code',
labels={'GP_pct': 'GP Percentage', 'appointment_mode': 'Appointment Mode'})
fig.update_layout(
xaxis_tickangle=-45,
barmode='group'
)
fig.show()
Exploration: Utilisation and Capacity¶
# Calculate overall utilisation (95.63%)
percent = pd.pivot_table(
data=ar,
values='appointment_count',
columns='appointment_status',
aggfunc='sum'
)
percent['total'] = percent['Attended'] + percent['DNA']
percent['utilisation(%)'] = (percent['Attended']/ percent['total'] * 100).round(2)
percent
| appointment_status | Attended | DNA | Unknown | total | utilisation(%) |
|---|---|---|---|---|---|
| appointment_count | 270479029 | 13280848 | 11992521 | 283759877 | 95.32 |
Only 95.3% appointments are being utilised. This tends to suggest that there is slack in the system. Staffing levels therefore may/may not be adequate. This is incertain if networks are not at 100% capacity.
Utilisation by month¶
# Pivot to get monthly utilisation
month_ = (
ar.pivot_table(values='appointment_count',
index='month',
columns='appointment_status',
aggfunc='sum')
.assign(
total=lambda df: df['Attended'] + df['DNA'],
utilisation_pct=lambda df: (df['Attended'] / df['total'] * 100).round(2)
)
.reindex(month_order)
)
month_
| appointment_status | Attended | DNA | Unknown | total | utilisation_pct |
|---|---|---|---|---|---|
| month | |||||
| August | 22072706.0 | 944674.0 | 812626.0 | 23017380.0 | 95.90 |
| September | 25745698.0 | 1318269.0 | 1435376.0 | 27063967.0 | 95.13 |
| October | 27158054.0 | 1562316.0 | 1560182.0 | 28720370.0 | 94.56 |
| November | 27649315.0 | 1425547.0 | 1301758.0 | 29074862.0 | 95.10 |
| December | 22840398.0 | 1195213.0 | 1082466.0 | 24035611.0 | 95.03 |
| January | 23580128.0 | 1072474.0 | 955851.0 | 24652602.0 | 95.65 |
| February | 23333376.0 | 1073255.0 | 919910.0 | 24406631.0 | 95.60 |
| March | 27149715.0 | 1287042.0 | 1128660.0 | 28436757.0 | 95.47 |
| April | 21935445.0 | 1042100.0 | 912479.0 | 22977545.0 | 95.46 |
| May | 25318735.0 | 1195368.0 | 945083.0 | 26514103.0 | 95.49 |
| June | 23695459.0 | 1164590.0 | 938130.0 | 24860049.0 | 95.32 |
| July | NaN | NaN | NaN | NaN | NaN |
# Calculate the average utilisation of services by month Utilisation based on attended / DNA
monthly_apps = pd.pivot_table(
data=ar,
values='appointment_count',
index='month',
columns=['appointment_status'],
aggfunc='sum'
)
monthly_apps['total'] = monthly_apps['Attended'] + monthly_apps['DNA']
# re-order months in pivot table
monthly_apps = monthly_apps.reindex(month_order)
# Created a column showing % of appointments used
monthly_apps['utilisation(%)'] = (monthly_apps['Attended']/ monthly_apps['total'] * 100).round(2)
# Created a column showing to a daily value (based on 30 days)
monthly_apps['daily_attendance'] = (monthly_apps['Attended'] / 30).round(2)
# View the DataFrame.
monthly_apps
| appointment_status | Attended | DNA | Unknown | total | utilisation(%) | daily_attendance |
|---|---|---|---|---|---|---|
| month | ||||||
| August | 22072706.0 | 944674.0 | 812626.0 | 23017380.0 | 95.90 | 735756.87 |
| September | 25745698.0 | 1318269.0 | 1435376.0 | 27063967.0 | 95.13 | 858189.93 |
| October | 27158054.0 | 1562316.0 | 1560182.0 | 28720370.0 | 94.56 | 905268.47 |
| November | 27649315.0 | 1425547.0 | 1301758.0 | 29074862.0 | 95.10 | 921643.83 |
| December | 22840398.0 | 1195213.0 | 1082466.0 | 24035611.0 | 95.03 | 761346.60 |
| January | 23580128.0 | 1072474.0 | 955851.0 | 24652602.0 | 95.65 | 786004.27 |
| February | 23333376.0 | 1073255.0 | 919910.0 | 24406631.0 | 95.60 | 777779.20 |
| March | 27149715.0 | 1287042.0 | 1128660.0 | 28436757.0 | 95.47 | 904990.50 |
| April | 21935445.0 | 1042100.0 | 912479.0 | 22977545.0 | 95.46 | 731181.50 |
| May | 25318735.0 | 1195368.0 | 945083.0 | 26514103.0 | 95.49 | 843957.83 |
| June | 23695459.0 | 1164590.0 | 938130.0 | 24860049.0 | 95.32 | 789848.63 |
| July | NaN | NaN | NaN | NaN | NaN | NaN |
plot_format()
# Plot monthly capacity utilisation.
sns.lineplot(data=monthly_apps,x='month',y='total',label='Total')
sns.lineplot(data=monthly_apps,x='month',y='Attended', label='Attended')
sns.lineplot(data=monthly_apps,x='month',y='DNA',label='Did Not Attend')
plt.legend()
plt.title('Chart showing 95% attendance')
# Re-order months
ax = plt.gca()
plt.xticks(rotation=45)
plt.title("Appointment Attendance");
# Create a lineplot.
Utilisation ranges between 94.56% and 95.9%
Utilisation and staff¶
# Determine % of appointments cancelled relating to hcp_type
hcp =(
ar.pivot_table(
values='appointment_count',
index='hcp_type',
columns='appointment_status',
aggfunc='sum'
)
.assign(
total = lambda df: df['Attended'] + df['DNA'], # create total
pct = lambda df: (df['Attended']/ df['total'] * 100).round(2) # create %
))
hcp
| appointment_status | Attended | DNA | Unknown | total | pct |
|---|---|---|---|---|---|
| hcp_type | |||||
| GP | 140102040 | 3829781 | 4305542 | 143931821 | 97.34 |
| Other Practice staff | 121734048 | 9178841 | 6933459 | 130912889 | 92.99 |
| Unknown | 8642941 | 272226 | 753520 | 8915167 | 96.95 |
# Medical appointments
hcp_pivot = pivot_function(
ar, 'hcp_type', 'appointment_status', 'appointment_count', 'month',
target_outcome='Attended'
)
hcp_pivot = hcp_pivot.reindex(month_order)
hcp_pivot
| hcp_type | GP | Other Practice staff | GP | Other Practice staff | ||
|---|---|---|---|---|---|---|
| appointment_status | Attended | DNA | Attended | DNA | Attended (%) | Attended (%) |
| month | ||||||
| August | 11708057.0 | 271814.0 | 9688402.0 | 652491.0 | 97.73 | 93.69 |
| September | 13592889.0 | 392037.0 | 11358456.0 | 901514.0 | 97.20 | 92.65 |
| October | 13339524.0 | 439647.0 | 12888298.0 | 1090065.0 | 96.81 | 92.20 |
| November | 14050249.0 | 388131.0 | 12644696.0 | 1002219.0 | 97.31 | 92.66 |
| December | 11957206.0 | 319638.0 | 10113211.0 | 846302.0 | 97.40 | 92.28 |
| January | 12470633.0 | 302384.0 | 10401221.0 | 747854.0 | 97.63 | 93.29 |
| February | 12198417.0 | 309926.0 | 10417610.0 | 742160.0 | 97.52 | 93.35 |
| March | 14305014.0 | 382332.0 | 12001478.0 | 880620.0 | 97.40 | 93.16 |
| April | 11258586.0 | 304701.0 | 9968784.0 | 718013.0 | 97.36 | 93.28 |
| May | 13049643.0 | 364671.0 | 11474524.0 | 808645.0 | 97.28 | 93.42 |
| June | 12171822.0 | 354500.0 | 10777368.0 | 788958.0 | 97.17 | 93.18 |
| July | NaN | NaN | NaN | NaN | NaN | NaN |
Both GPs and other staff are busy at similar types, although during autumn significantly more pressure is on 'other healthcare professionals'.
plot_format()
# Pandas method
ax = hcp_pivot[[('GP', 'Attended (%)'), ('Other Practice staff', 'Attended (%)')]].plot(
kind='line',
marker='o',
title='Appointments Attended by HCP Type (%)'
)
ax.set_ylabel('Attended (%)')
ax.legend(['GP', 'Other Practice staff'])
plt.tight_layout()
plt.show()
Utilisation and Appointment Mode¶
Overall, Face-to-Face appointments are cancelled most frequently and Telephone appointments the least.
# Determine % of appointments cancelled by mode
mode = (
ar.pivot_table(
values='appointment_count',
index='appointment_mode',
columns='appointment_status',
aggfunc='sum'
)
.assign(
total=lambda df: df['Attended'] + df['DNA'],
pct=lambda df: (df['Attended'] / df['total'] * 100).round(2)
)
)
mode
| appointment_status | Attended | DNA | Unknown | total | pct |
|---|---|---|---|---|---|
| appointment_mode | |||||
| Face-to-Face | 164434900 | 10743715 | 8174625 | 175178615 | 93.87 |
| Home Visit | 1494800 | 76189 | 390143 | 1570989 | 95.15 |
| Telephone | 95306218 | 2080948 | 2792120 | 97387166 | 97.86 |
| Unknown | 7829901 | 303177 | 586592 | 8133078 | 96.27 |
| Video/Online | 1413210 | 76819 | 49041 | 1490029 | 94.84 |
Utilisation by Appointment Mode and Month¶
# Medical appointments
mode_pivot = pivot_function(
ar, 'appointment_mode', 'appointment_status', 'appointment_count', 'month',
target_outcome='Attended'
)
mode_pivot = mode_pivot.reindex(month_order)
mode_pivot
| appointment_mode | Face-to-Face | Home Visit | Telephone | Video/Online | Face-to-Face | Home Visit | Telephone | Video/Online | ||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| appointment_status | Attended | DNA | Attended | DNA | Attended | DNA | Attended | DNA | Attended (%) | Attended (%) | Attended (%) | Attended (%) |
| month | ||||||||||||
| August | 12481186.0 | 723111.0 | 113533.0 | 6549.0 | 8724392.0 | 184850.0 | 101541.0 | 5136.0 | 94.52 | 94.55 | 97.93 | 95.19 |
| September | 15142705.0 | 1066003.0 | 126549.0 | 7517.0 | 9575370.0 | 205075.0 | 137877.0 | 7882.0 | 93.42 | 94.39 | 97.90 | 94.59 |
| October | 16982958.0 | 1312707.0 | 132144.0 | 7574.0 | 8978200.0 | 190570.0 | 156905.0 | 8900.0 | 92.83 | 94.58 | 97.92 | 94.63 |
| November | 16949964.0 | 1167488.0 | 149543.0 | 8729.0 | 9497883.0 | 201527.0 | 146366.0 | 8595.0 | 93.56 | 94.48 | 97.92 | 94.45 |
| December | 13597975.0 | 971685.0 | 132406.0 | 8320.0 | 8289616.0 | 175815.0 | 114154.0 | 7146.0 | 93.33 | 94.09 | 97.92 | 94.11 |
| January | 13921132.0 | 851572.0 | 127718.0 | 7413.0 | 8776010.0 | 182045.0 | 120476.0 | 6494.0 | 94.24 | 94.51 | 97.97 | 94.89 |
| February | 14059436.0 | 858543.0 | 129188.0 | 6069.0 | 8388916.0 | 179837.0 | 118086.0 | 6131.0 | 94.24 | 95.51 | 97.90 | 95.06 |
| March | 16536414.0 | 1035050.0 | 152788.0 | 6369.0 | 9579993.0 | 212548.0 | 140662.0 | 7199.0 | 94.11 | 96.00 | 97.83 | 95.13 |
| April | 13588356.0 | 841473.0 | 132294.0 | 5221.0 | 7492249.0 | 169628.0 | 117385.0 | 5983.0 | 94.17 | 96.20 | 97.79 | 95.15 |
| May | 16020354.0 | 968223.0 | 153939.0 | 6632.0 | 8350132.0 | 193483.0 | 133287.0 | 6804.0 | 94.30 | 95.87 | 97.74 | 95.14 |
| June | 15154420.0 | 947860.0 | 144698.0 | 5796.0 | 7653457.0 | 185570.0 | 126471.0 | 6549.0 | 94.11 | 96.15 | 97.63 | 95.08 |
| July | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
Chart moved to conclusions:
# Plotly Method
fig = go.Figure()
# Add each line directly from the pivot table
fig.add_trace(go.Scatter(
x=mode_pivot.index,
y=mode_pivot[('Face-to-Face', 'Attended (%)')],
mode='lines+markers',
name='Face to Face'
))
fig.add_trace(go.Scatter(
x=mode_pivot.index,
y=mode_pivot[('Home Visit', 'Attended (%)')],
mode='lines+markers',
name='Home Visit'
))
fig.add_trace(go.Scatter(
x=mode_pivot.index,
y=mode_pivot[('Telephone','Attended (%)')],
mode='lines+markers',
name='Telephone'
))
fig.add_trace(go.Scatter(
x=mode_pivot.index,
y=mode_pivot[('Video/Online','Attended (%)')],
mode='lines+markers',
name='Video'
))
fig.update_layout(
title='Appointments Attended by Mode (%)',
xaxis_title='Month',
yaxis_title='Attended (%)',
height=500
)
fig.show()
There are considerably more home visits from late winter into spring. What might be driving this?
Total appointments by mode¶
Home visit attendance ranges between 94.1 and 96.2%. Highest attendance during busiest period.
# Plotly Method
fig = go.Figure()
# Add each line directly from the pivot table
fig.add_trace(go.Scatter(
x=mode_pivot.index,
y=mode_pivot[('Face-to-Face', 'Attended')],
mode='lines+markers',
name='Face to Face'
))
fig.add_trace(go.Scatter(
x=mode_pivot.index,
y=mode_pivot[('Home Visit', 'Attended')],
mode='lines+markers',
marker=dict(symbol='square',size=10,color='red'),
name='Home Visit'
))
fig.add_trace(go.Scatter(
x=mode_pivot.index,
y=mode_pivot[('Telephone','Attended')],
mode='lines+markers',
name='Telephone'
))
fig.add_trace(go.Scatter(
x=mode_pivot.index,
y=mode_pivot[('Video/Online', 'Attended')],
mode='lines+markers',
line=dict(dash='dot', width=3,color='yellow'),
name='Video'
))
fig.update_layout(
title='Appointment Totals by Mode',
xaxis_title='Month',
yaxis_title='Attended',
height=500
)
fig.show()
Q4: During the busiest months Face-to-Face visits are the most common appointment mode.
Utilisation by Region¶
Region has very little influence on it's own (see below).
# Determine % of appointments cancelled relating to region
region = pd.pivot_table(
data=ar,
values='appointment_count',
index='region',
columns='appointment_status',
aggfunc='sum'
)
region['total'] = region['Attended'] + region['DNA'] # create total
region['utilisation(%)'] = (region['Attended']/ region['total'] * 100).round(2) # create %
region
| appointment_status | Attended | DNA | Unknown | total | utilisation(%) |
|---|---|---|---|---|---|
| region | |||||
| East of England | 31569080 | 1202469 | 1261910 | 32771549 | 96.33 |
| London | 39207133 | 2406467 | 1809280 | 41613600 | 94.22 |
| Midlands | 52562333 | 2557642 | 2194472 | 55119975 | 95.36 |
| North East & Yorkshire | 44172983 | 1890871 | 1799711 | 46063854 | 95.90 |
| North West | 32034002 | 1991351 | 1755759 | 34025353 | 94.15 |
| South East | 41163384 | 1970145 | 1846803 | 43133529 | 95.43 |
| South West | 29770114 | 1261903 | 1324586 | 31032017 | 95.93 |
Month also has little impact on utilisation by itself (see below)
Booking Lead Time Analysis¶
Q5: Most appointments are 'same day' bookings (see below)
plot_format()
# Create a line plot to answer the question.
sns.lineplot(data=ar, x='month',y='appointment_count',hue='booking_lead_time',errorbar=None)
plt.title('Most appointments booked on same day')
plt.xticks(rotation=45)
plt.legend(loc='center left')
plt.savefig('slide_9',dpi=150, bbox_inches='tight')
Clear that same day appointments are most attended and those over 28 days are least attended.
plot_format()
sns.histplot(
data=ar, x='booking_lead_time', weights='appointment_count',
hue='appointment_status', multiple='fill', shrink=0.8, stat='percent'
)
plt.ylabel('Percentage')
plt.xticks(rotation=45)
plt.show()
#plt.savefig('slide_8',dpi=150, bbox_inches='tight');
# Pivot to get monthly utilisation
month_ = (
ar.pivot_table(values='appointment_count',
index='month',
columns='appointment_status',
aggfunc='sum')
.assign(
total=lambda df: df['Attended'] + df['DNA'],
utilisation_pct=lambda df: (df['Attended'] / df['total'] * 100).round(2)
)
.reindex(month_order)
)
month_
| appointment_status | Attended | DNA | Unknown | total | utilisation_pct |
|---|---|---|---|---|---|
| month | |||||
| August | 22072706.0 | 944674.0 | 812626.0 | 23017380.0 | 95.90 |
| September | 25745698.0 | 1318269.0 | 1435376.0 | 27063967.0 | 95.13 |
| October | 27158054.0 | 1562316.0 | 1560182.0 | 28720370.0 | 94.56 |
| November | 27649315.0 | 1425547.0 | 1301758.0 | 29074862.0 | 95.10 |
| December | 22840398.0 | 1195213.0 | 1082466.0 | 24035611.0 | 95.03 |
| January | 23580128.0 | 1072474.0 | 955851.0 | 24652602.0 | 95.65 |
| February | 23333376.0 | 1073255.0 | 919910.0 | 24406631.0 | 95.60 |
| March | 27149715.0 | 1287042.0 | 1128660.0 | 28436757.0 | 95.47 |
| April | 21935445.0 | 1042100.0 | 912479.0 | 22977545.0 | 95.46 |
| May | 25318735.0 | 1195368.0 | 945083.0 | 26514103.0 | 95.49 |
| June | 23695459.0 | 1164590.0 | 938130.0 | 24860049.0 | 95.32 |
| July | NaN | NaN | NaN | NaN | NaN |
HCP and Mode Combined¶
Other Staff and Face-to-Face / online signicantly under utilised
hcp_plus_mode= (
ar.pivot_table(
values='appointment_count',
index=['hcp_type','appointment_mode'],
columns='appointment_status',
aggfunc='sum')
.assign(
total = lambda df: df['Attended'] + df['DNA'], # create total
pct = lambda df: (df['Attended']/ df['total'] * 100).round(2)
))
hcp_plus_mode
| appointment_status | Attended | DNA | Unknown | total | pct | |
|---|---|---|---|---|---|---|
| hcp_type | appointment_mode | |||||
| GP | Face-to-Face | 70089048 | 2599222 | 2420398 | 72688270 | 96.42 |
| Home Visit | 647267 | 21000 | 147014 | 668267 | 96.86 | |
| Telephone | 67330728 | 1144185 | 1637126 | 68474913 | 98.33 | |
| Unknown | 1351566 | 44347 | 79520 | 1395913 | 96.82 | |
| Video/Online | 683431 | 21027 | 21484 | 704458 | 97.02 | |
| Other Practice staff | Face-to-Face | 92755796 | 8099216 | 5485883 | 100855012 | 91.97 |
| Home Visit | 686722 | 27058 | 170434 | 713780 | 96.21 | |
| Telephone | 26348843 | 893656 | 1074142 | 27242499 | 96.72 | |
| Unknown | 1216193 | 103202 | 175506 | 1319395 | 92.18 | |
| Video/Online | 726494 | 55709 | 27494 | 782203 | 92.88 | |
| Unknown | Face-to-Face | 1590056 | 45277 | 268344 | 1635333 | 97.23 |
| Home Visit | 160811 | 28131 | 72695 | 188942 | 85.11 | |
| Telephone | 1626647 | 43107 | 80852 | 1669754 | 97.42 | |
| Unknown | 5262142 | 155628 | 331566 | 5417770 | 97.13 | |
| Video/Online | 3285 | 83 | 63 | 3368 | 97.54 |
Lead time and month combined¶
lead_plus_month = (
pd.pivot_table(
data=ar,
values='appointment_count',
index=['booking_lead_time', 'month'],
columns='appointment_status',
aggfunc='sum'
)
.assign(
total = lambda df: df['Attended'] + df['DNA'],
utilisation_percent = lambda df: (df['Attended'] / df['total'] * 100).round(2)
))
lead_plus_month.head(30)
| appointment_status | Attended | DNA | Unknown | total | utilisation_percent | |
|---|---|---|---|---|---|---|
| booking_lead_time | month | |||||
| 1 Day | April | 1943406 | 80115 | 63820 | 2023521 | 96.04 |
| August | 1883711 | 72733 | 60335 | 1956444 | 96.28 | |
| December | 2049032 | 97988 | 71369 | 2147020 | 95.44 | |
| February | 2115779 | 87138 | 68885 | 2202917 | 96.04 | |
| January | 2174201 | 91231 | 65785 | 2265432 | 95.97 | |
| June | 2033281 | 85991 | 67972 | 2119272 | 95.94 | |
| March | 2518219 | 105818 | 84153 | 2624037 | 95.97 | |
| May | 2109755 | 86115 | 69500 | 2195870 | 96.08 | |
| November | 2317942 | 103812 | 83583 | 2421754 | 95.71 | |
| October | 2387826 | 116880 | 92706 | 2504706 | 95.33 | |
| September | 2470440 | 111702 | 82836 | 2582142 | 95.67 | |
| 15 to 21 Days | April | 1474721 | 136060 | 108248 | 1610781 | 91.55 |
| August | 1271577 | 105198 | 75235 | 1376775 | 92.36 | |
| December | 1428329 | 149028 | 130653 | 1577357 | 90.55 | |
| February | 1305166 | 116372 | 101830 | 1421538 | 91.81 | |
| January | 1110841 | 103922 | 96784 | 1214763 | 91.45 | |
| June | 1654592 | 152401 | 107109 | 1806993 | 91.57 | |
| March | 1630248 | 147845 | 127252 | 1778093 | 91.69 | |
| May | 1548172 | 133148 | 99261 | 1681320 | 92.08 | |
| November | 1904536 | 184407 | 161833 | 2088943 | 91.17 | |
| October | 1652333 | 180132 | 172505 | 1832465 | 90.17 | |
| September | 1525576 | 155199 | 187001 | 1680775 | 90.77 | |
| 2 to 7 Days | April | 4090959 | 262892 | 172254 | 4353851 | 93.96 |
| August | 4494484 | 261761 | 167225 | 4756245 | 94.50 | |
| December | 4198816 | 321505 | 219869 | 4520321 | 92.89 | |
| February | 4955040 | 317775 | 192251 | 5272815 | 93.97 | |
| January | 5089519 | 322082 | 208222 | 5411601 | 94.05 | |
| June | 4265542 | 278651 | 161352 | 4544193 | 93.87 | |
| March | 5492887 | 357962 | 227040 | 5850849 | 93.88 | |
| May | 5054848 | 326914 | 182062 | 5381762 | 93.93 |
nc_apps = (
pd.pivot_table(
data=nc,
values='appointment_count',
index='month',
columns='service_setting',
aggfunc='sum'
)
.assign(
total = lambda df: (
df['Primary Care Network'] +
df['Extended Access Provision'] +
df['General Practice'] +
df['Other'] +
df['Unmapped']
)
)
.reindex(month_order)
.drop('July', axis=0)
)
nc_apps
| service_setting | Extended Access Provision | General Practice | Other | Primary Care Network | Unmapped | total |
|---|---|---|---|---|---|---|
| month | ||||||
| August | 160927.0 | 21575852.0 | 449101.0 | 432448.0 | 1233843.0 | 23852171.0 |
| September | 187906.0 | 25940821.0 | 527174.0 | 530485.0 | 1336115.0 | 28522501.0 |
| October | 209539.0 | 27606171.0 | 556487.0 | 564981.0 | 1366656.0 | 30303834.0 |
| November | 207577.0 | 27767889.0 | 558784.0 | 614324.0 | 1256496.0 | 30405070.0 |
| December | 173504.0 | 23008818.0 | 464718.0 | 539479.0 | 954257.0 | 25140776.0 |
| January | 186375.0 | 23583053.0 | 457440.0 | 569044.0 | 839562.0 | 25635474.0 |
| February | 196627.0 | 23305934.0 | 456153.0 | 585300.0 | 811246.0 | 25355260.0 |
| March | 231905.0 | 27187368.0 | 530677.0 | 702176.0 | 942912.0 | 29595038.0 |
| April | 192284.0 | 21916791.0 | 437402.0 | 606270.0 | 760313.0 | 23913060.0 |
| May | 220511.0 | 25238620.0 | 503327.0 | 712280.0 | 820770.0 | 27495508.0 |
| June | 209652.0 | 23680374.0 | 478813.0 | 700599.0 | 758640.0 | 25828078.0 |
# Apply formatting
plot_format()
exclude_cols = ['Unmapped', 'total'] # columns to exclude
ss_filt = nc_apps.drop(columns=exclude_cols)
sns.lineplot(data=ss_filt, marker='o')
plt.xticks(rotation=45)
plt.title('Greatest burden on Surgeries')
plt.tight_layout() # helps prevent label cutoff
# Uncomment to save the figure
# plt.savefig('slide_10.png', dpi=150, bbox_inches='tight')
plt.show()
General Practices account for significant majority of appointments (see above)
EAP use may indicate over capacity for whole year
Perhaps PCNs could be utilised during busy periods
plot_format()
exclude = ['Unmapped','General Practice','total']
ss_filt = nc_apps.drop(columns=exclude)
sns.lineplot(data=ss_filt, markers=True)
plt.xticks(rotation=45)
plt.title('Could other services absorb more capacity?')
plt.legend(loc='upper left')
plt.tight_layout()
#plt.savefig('slide_11a',dpi=150, bbox_inches='tight')
plt.show()
nc.head()
| appointment_date | icb_ons_code | icb_location_name | sub_icb_code | service_setting | context_type | national_category | appointment_count | appointment_month | month | day | region | year | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2021-08-02 | E54000050 | NHS North East and North Cumbria ICB | 00L | Primary Care Network | Care Related Encounter | Patient contact during Care Home Round | 3 | 2021-08 | August | Monday | North East & Yorkshire | 2021 |
| 1 | 2021-08-02 | E54000050 | NHS North East and North Cumbria ICB | 00L | Other | Care Related Encounter | Planned Clinics | 7 | 2021-08 | August | Monday | North East & Yorkshire | 2021 |
| 2 | 2021-08-02 | E54000050 | NHS North East and North Cumbria ICB | 00L | General Practice | Care Related Encounter | Home Visit | 79 | 2021-08 | August | Monday | North East & Yorkshire | 2021 |
| 3 | 2021-08-02 | E54000050 | NHS North East and North Cumbria ICB | 00L | General Practice | Care Related Encounter | General Consultation Acute | 725 | 2021-08 | August | Monday | North East & Yorkshire | 2021 |
| 4 | 2021-08-02 | E54000050 | NHS North East and North Cumbria ICB | 00L | General Practice | Care Related Encounter | Structured Medication Review | 2 | 2021-08 | August | Monday | North East & Yorkshire | 2021 |
Provide a summary of your findings and recommendations based on the analysis.¶
- Overall utilisation of services was calculated at 95.63%. This figure alone suggests there is enough capacity overall in the system, however to evaluate capacity in full more data would be required in relation to working hours, specifically overtime hours. Without this data its impossible to form a definitive conclusion in relation to capacity.
- Overall there is little variance in utilisation during a year It ranges between 94.71% and 96.27%. In terms of appointments overall there is a double peak, from January to March, and a second in June.
# Plot monthly capacity utilisation (slide 1)
plot_format()
sns.lineplot(data=monthly_apps,x='month',y='total',label='Total')
sns.lineplot(data=monthly_apps,x='month',y='Attended', label='Attended')
sns.lineplot(data=monthly_apps,x='month',y='DNA',label='Did Not Attend')
plt.legend()
plt.title('Chart showing 95% attendance')
ax = plt.gca()
plt.title("appointments peak in autumn and march")
plt.xticks(rotation=45)
plt.savefig('slide_1.png', dpi=150,bbox_inches='tight');
- Staffing during busy periods. The increased % of appointments by Other staff, peaking in October, is likely to be linked to vaccinations. Most of which take place in autumn and early winter.
hcp_pivot2 = (
pd.pivot_table(
data=ar,
values='appointment_count',
index='month',
columns='hcp_type',
aggfunc='sum'
)
.assign(
total=lambda df: df['GP'] + df['Other Practice staff'],
**{
'GP(%)': lambda df: (df['GP'] / df['total'] * 100).round(2),
'Other(%)': lambda df: (df['Other Practice staff'] / df['total'] * 100).round(2)
}
)
)
#Create plot
sns.lineplot(data=hcp_pivot2,x='month',y='GP(%)', label='GP')
sns.lineplot(data=hcp_pivot2,x='month',y='Other(%)',label='Other Staff')
plt.legend()
plt.xticks(rotation=45)
plt.title("Pressure on whole practice")
plt.show()
Create a plot staff vs appointment mode. It appears Other staff have many more Face-to-Face visits What are the drivers behind the increase in Face-to-Face visits by both GPs and other staff in the surgery
# hcp and month
hcp = pd.pivot_table(
data=ar,
values='appointment_count',
index=['hcp_type','month'],
columns='appointment_mode',
aggfunc='sum'
)
plot_format()
# Create the catplot
g = sns.catplot(data=ar, x='month', y='appointment_count', hue='appointment_mode', kind='bar', col='hcp_type')
# Rotate x-axis labels for all subplots
for ax in g.axes.flat: # creates a loop to check all subplots
plt.setp(ax.get_xticklabels(), rotation=45)
plt.savefig('slide_18a',dpi=150, bbox_inches='tight')
plt.show()
- Increased demand during busy periods tends to be face to face appointments especially with Other Staff HCP Type. However Face-to-Face / other staff have a relatively low utilisation. A strategy is needed to either decrease face-to-face and or increase utilisation. Online sources indicate vaccinations take place in Autumn / March (covid booster). One solution might be to find alternative vaccination method.
# Reflect on utilisation during this period, can I add hcp_type
sns.lineplot(data=monthly_apps,x='month',y='utilisation(%)',label='Utilisation(%)')
plt.legend()
plt.title('Chart showing 95% attendance')
plt.xticks(rotation=90);
- Consider whether utilisation differs between hcp types and the mode of appointment? The lowest utilisation is for Face-to-Face meetings with Other Practice Staff, which explains previous slide which shows lower utilisation in October when Other Staff have more Face-to-Face appointments.
face_to_face = (
pd.pivot_table(
data=ar,
values='appointment_count',
index=['hcp_type', 'appointment_mode'],
columns='appointment_status',
aggfunc='sum'
)
.drop(index='Unknown', level=0)
.assign(
total=lambda df: df['Attended'] + df['DNA'],
**{
'utilisation(%)': lambda df: (df['Attended'] / df['total'] * 100).round(2)
}
)
.reset_index()
.query("appointment_mode == 'Face-to-Face'")
)
face_to_face
| appointment_status | hcp_type | appointment_mode | Attended | DNA | Unknown | total | utilisation(%) |
|---|---|---|---|---|---|---|---|
| 0 | GP | Face-to-Face | 70089048 | 2599222 | 2420398 | 72688270 | 96.42 |
| 5 | Other Practice staff | Face-to-Face | 92755796 | 8099216 | 5485883 | 100855012 | 91.97 |
Other factors that affect utilisation: 'lead booking time' is a key factor.
# Pivot table to show utilisation (overall)
lead_time = (
pd.pivot_table(
data=ar,
values='appointment_count',
index=['booking_lead_time'],
columns='appointment_status',
aggfunc='sum'
)
.assign(
total=lambda df: df['Attended'] + df['DNA'],
**{
'Utilisation(%)': lambda df: (df['Attended'] / df['total'] * 100).round(2)
}
)
.reset_index()
.pipe(lead_time_order)
)
lead_time
| appointment_status | booking_lead_time | Attended | DNA | Unknown | total | Utilisation(%) |
|---|---|---|---|---|---|---|
| 6 | Same Day | 125796242 | 2284169 | 2883749 | 128080411 | 98.22 |
| 0 | 1 Day | 24003592 | 1039523 | 810944 | 25043115 | 95.85 |
| 2 | 2 to 7 Days | 54636124 | 3689732 | 2394373 | 58325856 | 93.67 |
| 4 | 8 to 14 Days | 32523007 | 2864860 | 2102542 | 35387867 | 91.90 |
| 1 | 15 to 21 Days | 16506091 | 1563712 | 1367711 | 18069803 | 91.35 |
| 3 | 22 to 28 Days | 9474372 | 931458 | 1011689 | 10405830 | 91.05 |
| 5 | More than 28 Days | 7539601 | 907394 | 1421513 | 8446995 | 89.26 |
# For overall pattern see conclusions
# Pivot table to show utilisation (GP)
lead_time_GP = (
pd.pivot_table(
data=ar[ar['hcp_type'] == 'GP'],
values='appointment_count',
index=['booking_lead_time'],
columns='appointment_status',
aggfunc='sum'
)
.assign(
total=lambda df: df['Attended'] + df['DNA'],
**{
'Utilisation(%)': lambda df: (df['Attended'] / df['total'] * 100).round(2)
}
)
.reset_index()
.pipe(lead_time_order)
)
lead_time
| appointment_status | booking_lead_time | Attended | DNA | Unknown | total | Utilisation(%) |
|---|---|---|---|---|---|---|
| 6 | Same Day | 125796242 | 2284169 | 2883749 | 128080411 | 98.22 |
| 0 | 1 Day | 24003592 | 1039523 | 810944 | 25043115 | 95.85 |
| 2 | 2 to 7 Days | 54636124 | 3689732 | 2394373 | 58325856 | 93.67 |
| 4 | 8 to 14 Days | 32523007 | 2864860 | 2102542 | 35387867 | 91.90 |
| 1 | 15 to 21 Days | 16506091 | 1563712 | 1367711 | 18069803 | 91.35 |
| 3 | 22 to 28 Days | 9474372 | 931458 | 1011689 | 10405830 | 91.05 |
| 5 | More than 28 Days | 7539601 | 907394 | 1421513 | 8446995 | 89.26 |
# Create a figure and axis
fig, ax1 = plt.subplots(figsize=(10, 6))
# Plotting the bar plots for Attended and DNA
sns.barplot(data=lead_time_GP, x='booking_lead_time', y='Attended', ax=ax1,
color='orange', label='Attended',ci=None)
sns.barplot(data=lead_time_GP, x='booking_lead_time', y='DNA', ax=ax1,
color='blue', label='DNA',ci=None)
# Add a secondary y-axis for the utilisation line plot
ax2 = ax1.twinx()
# Plot the line for utilisation percentage
sns.lineplot(data=lead_time_GP, x='booking_lead_time', y='Utilisation(%)', ax=ax2,
color='green', label='Utilisation(%)', marker='o', linestyle='-', linewidth=2,ci=None)
# Set labels and title
ax1.set_xlabel('Booking Lead Time')
ax1.set_ylabel('Count')
ax2.set_ylabel('Utilisation (%)')
# Add legends
ax1.legend(loc='upper left', bbox_to_anchor=(0.2,1))
ax2.legend(loc='upper right')
# Rotate x-axis labels
ax1.set_xticklabels(ax1.get_xticklabels(), rotation=45, ha='right')
plt.title('Attendance and booking lead times (GP)')
plt.xticks(rotation=45)
plt.savefig('slide_15.png', dpi=150, bbox_inches='tight')
plt.show();
/var/folders/_g/lq327r795816_cw0zgqv50300000gn/T/ipykernel_8677/2078063227.py:27: UserWarning: set_ticklabels() should only be used with a fixed number of ticks, i.e. after set_ticks() or using a FixedLocator.
# Pivot table to show utilisation (HCP)
lead_time_HCP = (
pd.pivot_table(
data=ar[ar['hcp_type'] == 'Other Practice staff'],
values='appointment_count',
index=['booking_lead_time'],
columns='appointment_status',
aggfunc='sum'
)
.assign(
total=lambda df: df['Attended'] + df['DNA'],
**{
'Utilisation(%)': lambda df: (df['Attended'] / df['total'] * 100).round(2)
}
)
.reset_index()
.pipe(lead_time_order)
)
lead_time
| appointment_status | booking_lead_time | Attended | DNA | Unknown | total | Utilisation(%) |
|---|---|---|---|---|---|---|
| 6 | Same Day | 125796242 | 2284169 | 2883749 | 128080411 | 98.22 |
| 0 | 1 Day | 24003592 | 1039523 | 810944 | 25043115 | 95.85 |
| 2 | 2 to 7 Days | 54636124 | 3689732 | 2394373 | 58325856 | 93.67 |
| 4 | 8 to 14 Days | 32523007 | 2864860 | 2102542 | 35387867 | 91.90 |
| 1 | 15 to 21 Days | 16506091 | 1563712 | 1367711 | 18069803 | 91.35 |
| 3 | 22 to 28 Days | 9474372 | 931458 | 1011689 | 10405830 | 91.05 |
| 5 | More than 28 Days | 7539601 | 907394 | 1421513 | 8446995 | 89.26 |
# Create a figure and axis
fig, ax1 = plt.subplots(figsize=(10, 6))
# Plotting the bar plots for Attended and DNA
sns.barplot(data=lead_time_HCP, x='booking_lead_time', y='Attended', ax=ax1,
color='orange', label='Attended',ci=None)
sns.barplot(data=lead_time_HCP, x='booking_lead_time', y='DNA', ax=ax1,
color='blue', label='DNA',ci=None)
# Add a secondary y-axis for the utilisation line plot
ax2 = ax1.twinx()
# Plot the line for utilisation percentage
sns.lineplot(data=lead_time_HCP, x='booking_lead_time', y='Utilisation(%)', ax=ax2,
color='green', label='Utilisation(%)', marker='o', linestyle='-', linewidth=2,ci=None)
# Set labels and title
ax1.set_xlabel('Booking Lead Time')
ax1.set_ylabel('Count')
ax2.set_ylabel('Utilisation (%)')
# Add legends
ax1.legend(loc='upper left', bbox_to_anchor=(0.2,1))
ax2.legend(loc='upper right')
# Rotate x-axis labels
ax1.set_xticklabels(ax1.get_xticklabels(), rotation=45, ha='right')
plt.title('Attendance and booking lead times (Other Staff)')
plt.xticks(rotation=45)
plt.savefig('slide_15.png', dpi=150, bbox_inches='tight')
plt.show();
/var/folders/_g/lq327r795816_cw0zgqv50300000gn/T/ipykernel_8677/1862548190.py:27: UserWarning: set_ticklabels() should only be used with a fixed number of ticks, i.e. after set_ticks() or using a FixedLocator.
Other findings. Most appointments are 6-10 minutes long. Tuesday and Wednesday are the most popular day for appointments. I estimated average appointment to be 18.6 minutes. Most appointments are routine consultations. Planned clinics and planned clinical procedures increase in March. Twin peaks in Autumn and March.HCP type is a significant factor in terms of utilisation. This is exacerbated in relation to 'Other Practice Staff'. Increase in winter spring home visits.
Additional Charts¶
lead_time_4 = (
pd.pivot_table(
data = ar[ar['appointment_status'] == 'Unknown'], # Filter first
values='appointment_count',
index='booking_lead_time',
columns='appointment_status',
aggfunc='sum'
)
.reset_index()
.pipe(lead_time_order)
)
lead_time_4
| appointment_status | booking_lead_time | Unknown |
|---|---|---|
| 6 | Same Day | 2883749 |
| 0 | 1 Day | 810944 |
| 2 | 2 to 7 Days | 2394373 |
| 4 | 8 to 14 Days | 2102542 |
| 1 | 15 to 21 Days | 1367711 |
| 3 | 22 to 28 Days | 1011689 |
| 5 | More than 28 Days | 1421513 |
#Unknown values mirror plot above suggesting this is not the cause of 1 day anomaly
# Create a figure and axis
fig, ax1 = plt.subplots(figsize=(10, 6))
sns.barplot(data=lead_time_4, x='booking_lead_time', y='Unknown')
# Ensure 'booking_lead_time' is categorical with the correct order
lead_time_4['booking_lead_time'] = pd.Categorical(lead_time_4['booking_lead_time'],
categories=day_order2, ordered=True)
plt.title('Distribution of unknown values')
plt.savefig('slide_17.png',dpi=150, bbox_inches='tight')
region_ = (
ar.pivot_table(
values='appointment_count',
index='region',
columns='appointment_status',
aggfunc='sum'
)
.assign(
total=lambda df: df['Attended'] + df['DNA'],
pct=lambda df: (df['Attended'] / df['total'] * 100).round(2)
)
.reset_index()
)
region_
| appointment_status | region | Attended | DNA | Unknown | total | pct |
|---|---|---|---|---|---|---|
| 0 | East of England | 31569080 | 1202469 | 1261910 | 32771549 | 96.33 |
| 1 | London | 39207133 | 2406467 | 1809280 | 41613600 | 94.22 |
| 2 | Midlands | 52562333 | 2557642 | 2194472 | 55119975 | 95.36 |
| 3 | North East & Yorkshire | 44172983 | 1890871 | 1799711 | 46063854 | 95.90 |
| 4 | North West | 32034002 | 1991351 | 1755759 | 34025353 | 94.15 |
| 5 | South East | 41163384 | 1970145 | 1846803 | 43133529 | 95.43 |
| 6 | South West | 29770114 | 1261903 | 1324586 | 31032017 | 95.93 |
# Create a figure and axis
fig, ax1 = plt.subplots(figsize=(10, 6))
# Plotting the bar plots for Attended and DNA
sns.barplot(data=region_, x='region', y='Attended', ax=ax1, color='orange', label='Attended')
sns.barplot(data=region_, x='region', y='DNA', ax=ax1, color='blue', label='DNA')
# Add a secondary y-axis for the utilisation line plot
ax2 = ax1.twinx()
# Plot the line for utilisation percentage
sns.lineplot(data=region_, x='region', y='pct', ax=ax2, color='green', label='Utilisation(%)',
marker='o', linestyle='-', linewidth=2)
# Set labels and title
ax1.set_xlabel('region')
ax1.set_ylabel('Count')
ax2.set_ylabel('Utilisation (%)')
# Add legends
ax1.legend(loc='upper left', bbox_to_anchor=(0.1, 1))
ax2.legend(loc='upper right')
ax1.set_xticklabels(ax1.get_xticklabels(), rotation=45, ha='right')
plt.title('Attendance and utilisation by region')
plt.savefig('slide_13.png',dpi=150, bbox_inches='tight')
/var/folders/_g/lq327r795816_cw0zgqv50300000gn/T/ipykernel_8677/1999816355.py:23: UserWarning: set_ticklabels() should only be used with a fixed number of ticks, i.e. after set_ticks() or using a FixedLocator.
# Create a line plot to answer the question.
hcp_pivot = (
pd.pivot_table(
data=ar,
values='appointment_count',
index='month',
columns='hcp_type',
aggfunc='sum'
)
.assign(
total=lambda df: df['GP'] + df['Other Practice staff'],
**{
'GP(%)': lambda df: (df['GP'] / df['total'] * 100).round(2),
'Other(%)': lambda df: (df['Other Practice staff'] / df['total'] * 100).round(2)
}
)
.reindex(month_order)
)
hcp_pivot
# greater variance
| hcp_type | GP | Other Practice staff | Unknown | total | GP(%) | Other(%) |
|---|---|---|---|---|---|---|
| month | ||||||
| August | 12290112.0 | 10786456.0 | 753438.0 | 23076568.0 | 53.26 | 46.74 |
| September | 14477314.0 | 13116077.0 | 905952.0 | 27593391.0 | 52.47 | 47.53 |
| October | 14290342.0 | 14931278.0 | 1058932.0 | 29221620.0 | 48.90 | 51.10 |
| November | 14883843.0 | 14420204.0 | 1072573.0 | 29304047.0 | 50.79 | 49.21 |
| December | 12644693.0 | 11603844.0 | 869540.0 | 24248537.0 | 52.15 | 47.85 |
| January | 13123157.0 | 11694696.0 | 790600.0 | 24817853.0 | 52.88 | 47.12 |
| February | 12857137.0 | 11674435.0 | 794969.0 | 24531572.0 | 52.41 | 47.59 |
| March | 15113002.0 | 13515711.0 | 936704.0 | 28628713.0 | 52.79 | 47.21 |
| April | 11898608.0 | 11206857.0 | 784559.0 | 23105465.0 | 51.50 | 48.50 |
| May | 13780066.0 | 12803339.0 | 875781.0 | 26583405.0 | 51.84 | 48.16 |
| June | 12879089.0 | 12093451.0 | 825639.0 | 24972540.0 | 51.57 | 48.43 |
| July | NaN | NaN | NaN | NaN | NaN | NaN |
plot_format()
sns.lineplot(data=hcp_pivot,x='month',y='GP(%)', label='GP')
sns.lineplot(data=hcp_pivot,x='month',y='Other(%)',label='Other Staff')
plt.legend()
plt.title('Chart showing...')
plt.xticks(rotation=90)
plt.title("Pressure on whole practice")
Text(0.5, 1.0, 'Pressure on whole practice')
Other stuff¶
# Create a line plot to answer the question (possibly create a facet by month later on?)
pivot_data = pd.pivot_table(
data=ar,
values='appointment_count',
index=['month','appointment_status'],
columns=['appointment_mode'],
aggfunc='sum'
)
pivot_data
| appointment_mode | Face-to-Face | Home Visit | Telephone | Unknown | Video/Online | |
|---|---|---|---|---|---|---|
| month | appointment_status | |||||
| April | Attended | 13588356 | 132294 | 7492249 | 605161 | 117385 |
| DNA | 841473 | 5221 | 169628 | 19795 | 5983 | |
| Unknown | 607970 | 32515 | 225109 | 44059 | 2826 | |
| August | Attended | 12481186 | 113533 | 8724392 | 652054 | 101541 |
| DNA | 723111 | 6549 | 184850 | 25028 | 5136 | |
| Unknown | 500735 | 29571 | 233009 | 46233 | 3078 | |
| December | Attended | 13597975 | 132406 | 8289616 | 706247 | 114154 |
| DNA | 971685 | 8320 | 175815 | 32247 | 7146 | |
| Unknown | 730885 | 38550 | 250753 | 58330 | 3948 | |
| February | Attended | 14059436 | 129188 | 8388916 | 637750 | 118086 |
| DNA | 858543 | 6069 | 179837 | 22675 | 6131 | |
| Unknown | 601847 | 32563 | 239805 | 42649 | 3046 | |
| January | Attended | 13921132 | 127718 | 8776010 | 634792 | 120476 |
| DNA | 851572 | 7413 | 182045 | 24950 | 6494 | |
| Unknown | 627739 | 34022 | 245226 | 45543 | 3321 | |
| June | Attended | 15154420 | 144698 | 7653457 | 616413 | 126471 |
| DNA | 947860 | 5796 | 185570 | 18815 | 6549 | |
| Unknown | 626671 | 34094 | 233348 | 41140 | 2877 | |
| March | Attended | 16536414 | 152788 | 9579993 | 739858 | 140662 |
| DNA | 1035050 | 6369 | 212548 | 25876 | 7199 | |
| Unknown | 747627 | 38395 | 286274 | 52581 | 3783 | |
| May | Attended | 16020354 | 153939 | 8350132 | 661023 | 133287 |
| DNA | 968223 | 6632 | 193483 | 20226 | 6804 | |
| Unknown | 619678 | 37335 | 240340 | 44158 | 3572 | |
| November | Attended | 16949964 | 149543 | 9497883 | 905559 | 146366 |
| DNA | 1167488 | 8729 | 201527 | 39208 | 8595 | |
| Unknown | 899217 | 41973 | 285646 | 69807 | 5115 | |
| October | Attended | 16982958 | 132144 | 8978200 | 907847 | 156905 |
| DNA | 1312707 | 7574 | 190570 | 42565 | 8900 | |
| Unknown | 1164314 | 36299 | 273141 | 78215 | 8213 | |
| September | Attended | 15142705 | 126549 | 9575370 | 763197 | 137877 |
| DNA | 1066003 | 7517 | 205075 | 31792 | 7882 | |
| Unknown | 1047942 | 34826 | 279469 | 63877 | 9262 |
plot_format()
sns.lineplot(data=pivot_data, x='month',y='Face-to-Face',hue='appointment_status')
plot_format()
sns.lineplot(data=pivot_data, x='month',y='Home Visit',hue='appointment_status')
plot_format()
sns.lineplot(data=pivot_data, x='month',y='Telephone',hue='appointment_status')
plot_format()
sns.lineplot(data=pivot_data, x='month',y='Video/Online',hue='appointment_status')
<Axes: xlabel='month', ylabel='Face-to-Face'>
# Determine % of appointments cancelled relating to hcp_type
hcp = pd.pivot_table(
data=ar,
values='appointment_count',
index=['hcp_type','appointment_mode'],
columns='appointment_status',
aggfunc='sum'
)
hcp['total'] = hcp['Attended'] + hcp['DNA']
# Created a column showing % of appointments used
hcp['utilisation(%)'] = (hcp['Attended']/ hcp['total'] * 100).round(2)
hcp
| appointment_status | Attended | DNA | Unknown | total | utilisation(%) | |
|---|---|---|---|---|---|---|
| hcp_type | appointment_mode | |||||
| GP | Face-to-Face | 70089048 | 2599222 | 2420398 | 72688270 | 96.42 |
| Home Visit | 647267 | 21000 | 147014 | 668267 | 96.86 | |
| Telephone | 67330728 | 1144185 | 1637126 | 68474913 | 98.33 | |
| Unknown | 1351566 | 44347 | 79520 | 1395913 | 96.82 | |
| Video/Online | 683431 | 21027 | 21484 | 704458 | 97.02 | |
| Other Practice staff | Face-to-Face | 92755796 | 8099216 | 5485883 | 100855012 | 91.97 |
| Home Visit | 686722 | 27058 | 170434 | 713780 | 96.21 | |
| Telephone | 26348843 | 893656 | 1074142 | 27242499 | 96.72 | |
| Unknown | 1216193 | 103202 | 175506 | 1319395 | 92.18 | |
| Video/Online | 726494 | 55709 | 27494 | 782203 | 92.88 | |
| Unknown | Face-to-Face | 1590056 | 45277 | 268344 | 1635333 | 97.23 |
| Home Visit | 160811 | 28131 | 72695 | 188942 | 85.11 | |
| Telephone | 1626647 | 43107 | 80852 | 1669754 | 97.42 | |
| Unknown | 5262142 | 155628 | 331566 | 5417770 | 97.13 | |
| Video/Online | 3285 | 83 | 63 | 3368 | 97.54 |
# https://www.bbc.co.uk/news/articles/cvgpp0ze478o
Twitter text analysis¶
# Load data set
twts = pd.read_csv('tweets.csv')
# View DataFrame
twts.sort_values('tweet_favorite_count', ascending=False).head(3)
| tweet_id | tweet_full_text | tweet_entities | tweet_entities_hashtags | tweet_metadata | tweet_retweet_count | tweet_favorite_count | tweet_favorited | tweet_retweeted | tweet_lang | |
|---|---|---|---|---|---|---|---|---|---|---|
| 117 | 1567583855422611461 | Lipid-Lowering Drugs\n\n#TipsForNewDocs #MedEd... | {'hashtags': [{'text': 'TipsForNewDocs', 'indi... | #TipsForNewDocs, #MedEd, #MedTwitter, #medicin... | {'iso_language_code': 'en', 'result_type': 're... | 12 | 42 | False | False | en |
| 433 | 1567582427719282689 | You ready for $JCO @_JennyCo ❤️\n\n#Healthcare... | {'hashtags': [{'text': 'Healthcare', 'indices'... | #Healthcare | {'iso_language_code': 'en', 'result_type': 're... | 1 | 28 | False | False | en |
| 205 | 1567634936341069826 | How health insurance works 😂 \n\n#comedy #adul... | {'hashtags': [{'text': 'comedy', 'indices': [3... | #comedy, #adulting, #healthcare | {'iso_language_code': 'en', 'result_type': 're... | 5 | 20 | False | False | en |
# Explore the metadata.
twts.dtypes
tweet_id int64 tweet_full_text object tweet_entities object tweet_entities_hashtags object tweet_metadata object tweet_retweet_count int64 tweet_favorite_count int64 tweet_favorited bool tweet_retweeted bool tweet_lang object dtype: object
Clean and explore the data
# Check data types and null values
twts.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 1174 entries, 0 to 1173 Data columns (total 10 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 tweet_id 1174 non-null int64 1 tweet_full_text 1174 non-null object 2 tweet_entities 1174 non-null object 3 tweet_entities_hashtags 1007 non-null object 4 tweet_metadata 1174 non-null object 5 tweet_retweet_count 1174 non-null int64 6 tweet_favorite_count 1174 non-null int64 7 tweet_favorited 1174 non-null bool 8 tweet_retweeted 1174 non-null bool 9 tweet_lang 1174 non-null object dtypes: bool(2), int64(3), object(5) memory usage: 75.8+ KB
# Drop rows containing nulls
twts.dropna(inplace=True)
# Check deletions worked
twts.isnull().sum()
tweet_id 0 tweet_full_text 0 tweet_entities 0 tweet_entities_hashtags 0 tweet_metadata 0 tweet_retweet_count 0 tweet_favorite_count 0 tweet_favorited 0 tweet_retweeted 0 tweet_lang 0 dtype: int64
# Check deletions worked
twts.isnull().sum()
tweet_id 0 tweet_full_text 0 tweet_entities 0 tweet_entities_hashtags 0 tweet_metadata 0 tweet_retweet_count 0 tweet_favorite_count 0 tweet_favorited 0 tweet_retweeted 0 tweet_lang 0 dtype: int64
# Check for unique values
twts.nunique()
tweet_id 1007 tweet_full_text 857 tweet_entities 854 tweet_entities_hashtags 710 tweet_metadata 1 tweet_retweet_count 35 tweet_favorite_count 20 tweet_favorited 1 tweet_retweeted 1 tweet_lang 1 dtype: int64
# Create new DataFrame containing only the text.
twt_2 = twts[['tweet_full_text','tweet_entities_hashtags']]
# View the DataFrame.
twt_2.head(10)
| tweet_full_text | tweet_entities_hashtags | |
|---|---|---|
| 16 | RT @imedverse: I.V Drug Calculations Cheat She... | #TipsForNewDocs, #MedEd, #MedTwitter, #medicin... |
| 17 | RT @Khulood_Almani: 🔟#Applications of #AI in #... | #Applications, #AI, #healthcare, #digitalhealt... |
| 18 | RT @Khulood_Almani: 🔟#Applications of #AI in #... | #Applications, #AI, #healthcare, #digitalhealt... |
| 19 | RT @Khulood_Almani: 🔟#Applications of #AI in #... | #Applications, #AI, #healthcare, #digitalhealt... |
| 20 | RT @Khulood_Almani: 🔟#Applications of #AI in #... | #Applications, #AI, #healthcare, #digitalhealt... |
| 21 | RT @Khulood_Almani: #Healthcare #DigitalTransf... | #Healthcare, #DigitalTransformation, #digitalh... |
| 22 | RT @SoniaFurstenau: Germany has one of the top... | #healthcare |
| 23 | RT @SoniaFurstenau: Germany has one of the top... | #healthcare |
| 24 | RT @SoniaFurstenau: Germany has one of the top... | #healthcare |
| 25 | RT @imedverse: Features of Lung Diseases\n\n#p... | #pulmonary, #pulmtwitter, #pulmonology, #pulmo... |
# find individual hashtags
hashtags_list = twt_2['tweet_full_text'].str.findall(r'#\w+')
# create flattened list
all_hashtags = [x for sublist in hashtags_list for x in sublist]
#print(all_hashtags)
# Create a counter object (dictionary-like)
from collections import Counter
counts = Counter(all_hashtags)
# Transform counter object into a DataFrame
hashtag_df = pd.DataFrame(counts.items(), columns=['Hashtag','Count']).sort_values(by='Count', ascending=False)
# Display records where the count is larger than 25
hashtag_abv_10 = hashtag_df[hashtag_df['Count'] > 25]
hashtag_abv_10
| Hashtag | Count | |
|---|---|---|
| 8 | #healthcare | 599 |
| 15 | #Healthcare | 198 |
| 55 | #health | 70 |
| 28 | #HealthCare | 47 |
| 7 | #AI | 45 |
| 3 | #medicine | 42 |
| 203 | #job | 38 |
| 346 | #strategy | 31 |
| 4 | #medical | 30 |
# Create the plot.
plot_format()
sns.barplot(data=hashtag_abv_10, y='Hashtag', x='Count')
plt.xticks(rotation=90)
plt.title('Most popular hashtags')
plt.show()
# Display records where the count is larger than 10 but less than 47 (to preclude health/healthcare)
hashtag_mid = hashtag_df[(hashtag_df['Count'] > 10) & (hashtag_df['Count'] <= 45)]
# Create a new plot filtering out health / healthcare
plt.figure(figsize=(10,8))
sns.barplot(data=hashtag_mid, y='Hashtag', x='Count')
plt.xticks(rotation=90)
plt.title('Most popular hashtags (minus "health")')
plt.savefig('slide_7.png',dpi=150, bbox_inches='tight')
plt.show()
# View the barplot.
I removed health as it was an umbrella term but the above leads to few useful insights. One recommendation would be to review all the #TipsForNewDocs and perhaps #PatientCare for possible improvements. As a data analyst I was most interested by reviewing the Python hashtags (see below).
# Filter for rows containing #TipsForNewDocs
tips_tweets = twts[twts['tweet_entities_hashtags'].str.contains('#Python', na=False)]
tips_tweets.head()
| tweet_id | tweet_full_text | tweet_entities | tweet_entities_hashtags | tweet_metadata | tweet_retweet_count | tweet_favorite_count | tweet_favorited | tweet_retweeted | tweet_lang | |
|---|---|---|---|---|---|---|---|---|---|---|
| 21 | 1567609182001545221 | RT @Khulood_Almani: #Healthcare #DigitalTransf... | {'hashtags': [{'text': 'Healthcare', 'indices'... | #Healthcare, #DigitalTransformation, #digitalh... | {'iso_language_code': 'en', 'result_type': 're... | 107 | 0 | False | False | en |
| 129 | 1567655135421108226 | RT @pratititech: #AI tools in #healthcare!\n\n... | {'hashtags': [{'text': 'AI', 'indices': [17, 2... | #AI, #healthcare, #Python, #Nodejs, #MachineLe... | {'iso_language_code': 'en', 'result_type': 're... | 9 | 0 | False | False | en |
| 130 | 1567654980970217472 | RT @pratititech: #AI tools in #healthcare!\n\n... | {'hashtags': [{'text': 'AI', 'indices': [17, 2... | #AI, #healthcare, #Python, #Nodejs, #MachineLe... | {'iso_language_code': 'en', 'result_type': 're... | 9 | 0 | False | False | en |
| 131 | 1567654978881470464 | RT @pratititech: #AI tools in #healthcare!\n\n... | {'hashtags': [{'text': 'AI', 'indices': [17, 2... | #AI, #healthcare, #Python, #Nodejs, #MachineLe... | {'iso_language_code': 'en', 'result_type': 're... | 9 | 0 | False | False | en |
| 132 | 1567654962318180354 | RT @pratititech: #AI tools in #healthcare!\n\n... | {'hashtags': [{'text': 'AI', 'indices': [17, 2... | #AI, #healthcare, #Python, #Nodejs, #MachineLe... | {'iso_language_code': 'en', 'result_type': 're... | 9 | 0 | False | False | en |
Key Findings¶
Overall Resource Utilisation: 95.3% system-wide, with seasonal peaks in autumn (October-November) and March. During peak periods, 10-10.3% of total annual appointments occur monthly, with utilisation paradoxically lowest in October (94.6%) when demand is highest.
Service Distribution: 98% of appointments occur on weekdays, with 91.5% delivered through General Practices. National categories show: Clinical Triage (14%), General Consultation Acute (18%), and General Consultation Routine (33%). The high proportion of Clinical Triage indicates significant demand pressure and access challenges.
Seasonal Patterns: Autumn and March peaks create substantial pressure on all staff categories, with work reallocation proving difficult during these periods. GP-driven demand particularly evident in March, while autumn pressures affect broader practice teams.
Conclusion 1: Booking Lead Time Impact¶
Finding: Clear inverse relationship between booking lead time and utilisation - shorter intervals between booking and appointments correlate with reduced missed appointments.
Recommendation: Limit appointments booked more than 28 days in advance. Develop automated contact systems (potentially AI-enabled) to confirm patient requirements for advance bookings.
Follow-up: Investigate the clinical nature and necessity of missed long-lead appointments.
# Super simple version
fig = go.Figure([
go.Bar(x=lead_time['booking_lead_time'], y=lead_time['Attended'],
name='Attended', marker_color='orange'),
go.Bar(x=lead_time['booking_lead_time'], y=lead_time['DNA'],
name='DNA', marker_color='blue'),
go.Scatter(x=lead_time['booking_lead_time'], y=lead_time['Utilisation(%)'],
mode='lines+markers', name='Utilisation(%)', yaxis='y2',
line_color='green')
])
fig.update_layout(
yaxis2=dict(overlaying='y', side='right'),
title='Attendance and booking lead times (All Staff)',
barmode='stack'
)
# Apply your aesthetic formatting
format_plotly_chart(fig).show()
Conclusion 2: Practitioner and Appointment Mode Utilisation¶
Finding: Healthcare practitioner type significantly affects utilisation, particularly for face-to-face appointments. GP attendance rates reach 96.5% compared to 92% for other practice staff, representing 2.5 million lost hours annually for non-GP face-to-face appointments.
Recommendation: Investigate underlying causes of lower utilisation for non-GP face-to-face appointments, particularly whether these represent routine health checks that patients deprioritise.
Follow-up: Analyse appointment types and patient demographics for non-GP missed appointments before implementing targeted interventions.
# Determine % of appointments cancelled relating to hcp_type
{
"tags": [
"hide-input",
]
}
hcp_f2f = (ar
.query("hcp_type != 'Unknown' and appointment_status != 'Unknown'") # Filter out Unknowns
.pivot_table(
values='appointment_count',
index=['hcp_type', 'appointment_mode'],
columns='appointment_status',
aggfunc='sum'
)
.assign(
total=lambda x: x['Attended'] + x['DNA'],
utilisation_pct=lambda x: (x['Attended'] / x['total'] * 100).round(2)
)
.reset_index()
.query("appointment_mode == 'Face-to-Face'")
)
hcp_f2f
| appointment_status | hcp_type | appointment_mode | Attended | DNA | total | utilisation_pct |
|---|---|---|---|---|---|---|
| 0 | GP | Face-to-Face | 70089048 | 2599222 | 72688270 | 96.42 |
| 5 | Other Practice staff | Face-to-Face | 92755796 | 8099216 | 100855012 | 91.97 |
# Create subplots
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))
# First subplot - Utilisation percentage
sns.barplot(data=hcp_f2f,
y='utilisation_pct',
x='hcp_type',
hue='appointment_mode',
ax=ax1,
)
ax1.set_title('Face-to-Face Utilisation(%)')
ax1.set_xlabel('Staff Type')
ax1.set_ylabel('Utilisation (%)')
# Add labels to first subplot
for c in ax1.containers:
ax1.bar_label(c, fmt='%.2f%%')
# Second subplot - Total appointments
sns.barplot(data=hcp_f2f,
y='DNA',
x='hcp_type',
hue='appointment_mode',
ax=ax2,
)
ax2.set_title('Missed Face-to-Face Appointments')
ax2.set_xlabel('Staff Type')
ax2.set_ylabel('Total Appointments')
# Remove legend from second subplot
ax2.get_legend().remove()
# Add labels to second subplot
for c in ax2.containers:
ax2.bar_label(c, fmt=lambda x: f'{x/1e6:.1f}M')
# Adjust layout to prevent overlap
plt.tight_layout()
plt.show()
fig = go.Figure()
# Add each line directly from the pivot table
fig.add_trace(go.Scatter(
x=mode_pivot.index,
y=mode_pivot[('Face-to-Face', 'Attended (%)')],
mode='lines+markers',
name='Face to Face'
))
fig.add_trace(go.Scatter(
x=mode_pivot.index,
y=mode_pivot[('Home Visit', 'Attended (%)')],
mode='lines+markers',
name='Home Visit'
))
fig.add_trace(go.Scatter(
x=mode_pivot.index,
y=mode_pivot[('Telephone','Attended (%)')],
mode='lines+markers',
name='Telephone'
))
fig.add_trace(go.Scatter(
x=mode_pivot.index,
y=mode_pivot[('Video/Online','Attended (%)')],
mode='lines+markers',
name='Video'
))
fig.update_layout(
title='Appointments Attended by Mode (%)',
xaxis_title='Month',
yaxis_title='Attended (%)',
height=500
)
format_plotly_chart(fig).show()
Conclusion 3: Clinical Triage as Capacity Indicator¶
Finding: Clinical triages comprise 14% of appointments, with higher proportions in individual GP surgeries compared to Primary Care Networks (PCNs). This differential suggests PCNs operate more efficiently, as clinical triage typically indicates system pressure.
Recommendation: Accelerate GP surgery integration into PCNs to improve operational efficiency and reduce pressure-driven triaging.
Follow-up: Investigate alternative explanations for triage differentials between service models.
# Sort your data to match the month order before plotting
national_filtered['month'] = pd.Categorical(national_filtered['month'],
categories=month_order, ordered=True)
# aggregate for plotly
df = national_filtered.groupby(['national_category', 'month'],
as_index=False)['appointment_count'].sum()
# Filter out July from the data
df_no_july = df[df['month'] != 'July']
# Create the line plot
fig = px.line(df_no_july,
x='month',
y='appointment_count',
color='national_category',
title='Autumn and March Planned Procedures and Clinics')
# Update layout for legend and x-axis rotation
fig.update_layout(
legend=dict(
orientation="v", # vertical orientation
yanchor="middle",
y=0.5,
xanchor="left",
x=1.2
),
)
format_plotly_chart(fig).show()
fig = go.Figure()
fig.add_trace(go.Bar(
x=triage2['service_setting'],
y=triage2['pct'],
marker_color='blue', # Direct color assignment
name='Other'
))
fig.update_layout(
title='Clinical Triage Appointments by Service Setting',
xaxis_title='Region',
yaxis_title='Appointment Count'
)
format_plotly_chart(fig).show()
Conclusion 4: Extended Access Provision (EAP) Seasonal Response¶
Finding: EAP absorbs additional seasonal demand, with autumn increases driven by planned clinics and vaccinations, while March peaks reflect acute consultation needs. This demonstrates system flexibility but also indicates capacity strain during predictable periods.
Recommendation: Explore innovative vaccination delivery methods (micro-needle patches, oral vaccinations currently in clinical trials) through pharmacy networks or home delivery to reduce surgery-based vaccination burden.
Follow-up: Pilot alternative vaccination delivery models within PCN frameworks.
# Create the line plot
fig = px.line(filtered_natcat4)
# Update layout with title and formatting
fig.update_layout(
title='Extended Access Provision Appointments',
xaxis_tickangle=45,
legend=dict(
orientation="h", # horizontal legend
yanchor="top",
y=-0.4,
xanchor="left",
x=0
),
margin=dict(b=100) # Add bottom margin for legend space
)
# Show the plot
format_plotly_chart(fig).show()
Overall Strategic Conclusion¶
This analysis reveals a primary care system operating at high utilisation (95.3%) but with identifiable inefficiencies that compound during predictable seasonal peaks. The four key recommendations address different layers of the system: operational practices (booking patterns), service delivery models (practitioner-specific utilisation), structural organisation (PCN integration), and innovative capacity management (alternative vaccination delivery). Together, these interventions target the fundamental challenge identified in the research questions: while the system demonstrates adequate overall capacity, resource allocation and utilisation patterns create pressure points that reduce effectiveness. The high proportion of clinical triage (14%) and seasonal capacity strain indicate a system managing demand reactively rather than strategically. The recommendations collectively propose a shift toward more proactive, integrated primary care delivery. Better booking practices reduce waste, PCN integration improves operational efficiency, and innovative service delivery methods address predictable seasonal demands. This multi-layered approach acknowledges that NHS capacity challenges require both immediate operational improvements and longer-term structural changes to create sustainable, patient-centered primary care services.
Limitations: The 3 data sets could not be merged limiting the analysis as it was not possible to utilise all variables in combination.
Accurately evaluating capacity would require more information, such as staff numbers overtime and unpaid hours, to assess whether the staff are already working beyond their capacity. This would enable the calculation of capacity e.g contractual hours x staff numbers.